Pydonts
Pydonts
Pydonts
23-08-2021
Contents
Foreword 8
Naming matters 24
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Naming conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
PEP 8 recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Standard names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Verbosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Picking a name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Context is key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Practical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1
Chaining comparison operators 37
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chaining of comparison operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Ugly chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Examples in code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Deep unpacking 57
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Examples in code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Zip up 75
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
How zip works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Zip is lazy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Three is a crowd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Mismatched lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Enumerate me 84
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
How enumerate works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Optional start argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Unpacking when iterating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Examples in code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8
Pydon’t disrespect the Zen of Python
9
python.)
To kick-off the Pydon’t series we start with a set of guidelines that all Pythonistas should be aware of: the
Zen of Python.
The Zen of Python is like a meta style guide. While you have things like PEP 8 that tell you how you should
format your code, how to name your variables, etc., the Zen of Python provides you with the guidelines that
you should follow when thinking about (Python) code and when designing a program.
Zen of Python
You can read the Zen of Python by executing import this in your REPL, which should print the following
text:
The Zen of Python, by Tim Peters
References
• PEP 20 – The Zen of Python, https://www.python.org/dev/peps/pep-0020/
• “The Way of Python” mailing thread, https://groups.google.com/g/comp.lang.python/c/B_VxeTBClM0
/m/L8W9KlsiriUJ
• Tim Peters (software engineer), Wikipedia https://en.wikipedia.org/wiki/Tim_Peters_(software_engin
eer)
Introduction
At the time of writing this Pydon’t, I am finishing the preparation of my Python conference talk “Pydon’ts” at
EuroPython.
For that matter, today’s Pydon’t will be a bit different. Usually, I write about using Python’s core features to
write idiomatic, expressive, elegant Python code. In this Pydon’t I will share with you why this is important.
12
Beware, opinions ahead
Idiomatic code, readable code, “Pythonic” code, elegant code, these are all subjective things. That means
that whatever I write about these topics will never be 100% consensual. In other words, you might disagree.
I am fine with the fact that there are people who disagree with me, and I do invite you to make yourself heard,
maybe by writing me or leaving a comment on the blog – diversity of points of view is enriching.
I just want to let you know that this Pydon’t might not be a good read for you if you can’t stand the fact that
other people might think differently from you �.
Conclusion
Part of the elegance in your Python programs will come naturally from learning about the features that Python
has to offer, about the built-ins, the modules in the standard library, etc.
References
• Edsger W. Dijkstra (2012), “Selected Writings on Computing: A personal Perspective”, p.347, Springer
Science & Business Media;
• Aaron W. Hsu (2017), “Beginner Patterns vs Anti-patterns in APL”, FnConf’17, https://www.youtube.co
m/watch?v=v7Mt0GYHU9A [last accessed 06-07-2021];
• Tim Peters (2002), Python mailing list thread, https://mail.python.org/pipermail/python-list/2002-
December/134521.html [last accessed 06-07-2021];
17
Introduction
The overall style of your code can have a great impact on the readability of your code. And code is more
often read than written, so you (and others!) have a lot to benefit from you writing well stylised code.
In this Pydon’t, you will:
• understand the importance of having a consistent style; and
• learn about tools that help you with your code style.
By the way, this week I wrote a shorter and lighter Pydon’t, as I am still investing lots of time preparing for
Euro Python 2021 at the time of writing… I hope you still find it useful!
Code style
Consistency
Humans are creatures of habit. From the fact that the first leg that goes into your trousers is always the
same, to the fact that you always start brushing your teeth on the same side.
These habits automate routines that do not require much attention, so that you can spend your precious
brain power on other things.
As far as my experience goes, the same can be said about your coding style: if you write with a consistent
code style, it becomes easier to read because you already expect a given structure; you are only left with
acquiring the information within that structure.
Otherwise, if your style isn’t consistent, you have to spend more precious brain power parsing the structure
of what you are reading and only then apprehend the information within that structure.
PEP 8 is a document whose purpose is to outline a style guide for those who write Python code. It has plenty
of useful recommendations. However, right after the introduction, PEP 8 reads
“A style guide is about consistency. Consistency with this style guide is important. Consistency
within a project is more important. Consistency within one module or function is the most im-
portant.
However, know when to be inconsistent – sometimes style guide recommendations just aren’t
applicable. When in doubt, use your best judgment. Look at other examples and decide what
looks best. And don’t hesitate to ask!”
This is very important: PEP 8 is a style guide that contains recommendations, not laws or strict rules. And
what is more, notice that there is a strong focus on consistency. Using your own (possibly weird) style
consistently is better than using no style at all. That’s if you are working alone; in a project, it is a good idea
to decide on a particular style beforehand.
Whitespace matters
When I’m teaching Python, I often do some sort of live coding, where I explain things and type examples,
that I often ask students to type as well. I have noticed that people that are just starting with Python will
Auto-formatters
black
A class of tools that you can use is what are known as (auto-)formatters, of which black is a prime example
(see their repo here).
Auto-formatters like black take your code and reformat it so that it fits within the style that the tool sup-
ports/you configure.
pycodestyle pycodestyle checks if your style is similar to what PEP 8 recommends. In fact, pycodestyle
used to be called pep8, but was renamed so that people understand that:
1. PEP 8 isn’t a set of rigid rules; and
2. pycodestyle doesn’t match PEP 8’s recommendations 100%.
Let me modify the file my_f.py to the following:
## In my_f.py
import os, time
def f(a, b, x):
return a * x + b
If I run pycodestyle, this is what I get as output:
> python -m pycodestyle my_f.py
my_f.py:2:10: E401 multiple imports on one line
my_f.py:3:1: E302 expected 2 blank lines, found 0
We can see that pycodestyle complained about a couple of things:
1. the fact that I merged import os and import time; and
2. the fact that there aren’t enough empty lines separating the imports from f.
-------------------------------------
Your code has been rated at -20.00/10
We can see that pylint was more unforgiving, complaining about the fact that I did not include docstrings
and complaining about my 1-letter names. This might be something you appreciate! Or not!
I reckon personal taste plays a big role in picking these tools.
Installing pylint can be done through
python -m pip install pylint
Conclusion
As far as these tools are concerned, I suggest you pick something that is fairly consensual for your personal
projects, so that it doesn’t hurt you too much when you contribute to other projects. For open source projects,
you will often be asked to follow a given style, and there may or may not be tools that help you reformat your
code to follow that style.
This Pydon’t was not supposed to be a thorough review of all the possibilities there are out there, I only
touched upon a couple of popular alternatives, so that might be a decent indicator of things that are consen-
sual.
By the way, many IDEs these days have integrated support for these linters, making it even easier to harness
their helpful suggestions.
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Pay attention to the style with which you write code and pick a suite of tools to help you if you
want/need.”
This Pydon’t showed you that:
• coding style has an impact in code readability;
• tools like black and pycodestyle can help you fix the style of your code; and
• linters like flake8 and pylint can give further insights into some types of errors/bugs/problems your
programs might have.
References
• PEP 8 – Style Guide for Python Code, https://www.python.org/dev/peps/pep-0008 [last accessed
20-07-2021];
• black - The Uncompromising Code Formatter, https://github.com/psf/black [last accessed 20-07-
2021];
Introduction
Names are like real estate in our brains. They are labels we give to things, concepts, ideas, objects, so that it
is easier for us to refer to those things, but they take up space. As it turns out, we can only hold a very small
number of different ideas in our heads, so these labels are very expensive…
We might as well do the best job we can to manage them as well as possible!
24
In this Pydon’t, you will:
• learn about some naming conventions;
• learn the suggested naming conventions for Python code; and
• understand some do’s and don’ts for naming variables, functions, methods, etc, in Python.
Naming conventions
When we talk about names, there are two things that need to be discussed. One of them is the actual name
that you give to things, and the other is the way in which you write the name: the casing of the letters and
how consecutive words are separated.
These are often referred to as naming conventions, and there are a few of them. I will present them here, so
that I can refer to them later in the Pydon’t.
The list that follows is not comprehensive, in that there are more naming conventions out there. However,
they are not relevant for this Pydon’t.
• CAPSLOCK – all letters of all words are upper case and there is nothing to separate consecutive words;
• CAPS_LOCK_WITH_UNDERSCORES – like the one above, but with underscores separating words;
• lowercase – all letteres of all words are lower case and there is nothing to separate consecutive words;
• snake_case – like the one above, but with underscores separating words;
• PascalCase – all words are put together, but their initials are capitalised to help you know where one
word ends and the other begins; and
• camelCase – like the one above, except the very first word starts with a lower case letter as well.
On top of these naming conventions, sometimes leading and/or trailing underscores can be added to the
mix. That isn’t strictly related to the naming conventions by themselves, but it is related to the way Python
uses names. In case you need a refresher, I wrote a Pydon’t that teaches you all the usages of underscore
and, in particular, it tells you what the underscores do for you if in the beginning and/or end of a name.
PEP 8 recommendations
PEP 8 is a document – a Python Enhancement Proposal – that contains a style guide for Python, and it is the
most widely accepted and used style guide for Python. In case you don’t know it, it might be worth taking a
look at it.
PEP 8 starts by acknowledging that “the naming conventions of Python’s library are a bit of a mess”, so bear
in mind that if you start working on some project that already uses a specific naming convention, you should
stick to it. Remember that being consistent is more important than following the PEP 8 guide.
PascalCase
You can find the PascalCase convention often on classes. That is the most common use case for this con-
vention.
What this means is that your classes will look like:
class Circle(Shape):
# ...
and
class GameArena:
# ...
class HumanPlayer:
# ...
class NPC:
# ...
class AIPlayer:
# ...
Notice that the NPC and AIPlayer classes are actually using acronyms: NPC stands for non-playable character
and AI stands for artificial intelligence. PEP 8 recommends that you capitalise all letters of an acronym in a
PascalCase name. Sometimes this makes it look like we are using the CAPSLOCK convention.
Other common use cases for the PascalCase convention include exceptions – which shouldn’t surprise you
because exceptions come from classes –, and type variables.
snake_case
The snake_case naming convention is the bread and butter of naming in Python. Variables, functions, meth-
ods, arguments, they all use the snake_case convention.
So, prefer
def cool_function(cool_argument, optional_info):
# ...
to things like
def coolFunction(coolargument, optionalinfo):
# ...
CAPS_LOCK_WITH_UNDERSCORES
This naming convention, that might look a bit clunky to you, is actually used to represent global constants.
Python doesn’t have support for variables that are truly constant – in the sense that trying to change them
would throw an error – and so we use this widely adopted convention that variables that are used as global
constants are named with the CAPS_LOCK_WITH_UNDERSCORES convention.
Generally, you will find these “constants” in the beginning of a file.
For example, I often have a couple of paths defined this way:
IMG_BIN = "images"
LOG_FILE = "logs/app.log"
## ...
Standard names
There are a few cases where certain names are the golden standard in those situations.
self
A great example of that is the name of the first argument to instance methods. The first argument of such a
method is always self.
Therefore, do
class Square:
def __init__(self, side_length):
# ...
instead of
class Square:
def __init__(square, side_length):
# ...
class Square:
def __init__(a, b):
# ...
class Square:
def __init__(bananas, side_length):
# ...
Notice that all three alternatives above (that I claim you should avoid) are actually functional. Here is an
example:
cls
In a similar setting, cls is the widely accepted name for class methods.
Class methods are not the regular methods you define when you implement a custom class. Those are
called instance methods. Class methods are instance methods decorated with @classmethod.
Why is that? Well, class is a keyword so we can’t really have that as the parameter name. And for some
reason, people started sticking to cls instead of something like class_. So, nowadays, class methods have
their first parameter be cls.
A great example of a class method is the method dict.fromkeys, which you call to initialise a dictionary in
a different way:
>>> dict.fromkeys("abc")
{'a': None, 'b': None, 'c': None}
Class methods are often used to implement different ways of building instances of your classes, and that’s
precisely what is happening in the previous example: we are creating a dictionary (or, in other words, an
instance of the class dict) in a different way from what is the usual way.
Verbosity
Having discussed some of the most widely spread conventions when dealing with names in Python, I will now
share my experience regarding good naming principles.
One thing that is often object of many worries is the length of the name you are picking. Should you pick a
long name that contains much information but is a pain to type? Should you pick a short name that is easy
to type but a pain to recall what it is for?
Balance is key, always.
Remember that Python imposes a soft limit on the maximum length of a line, so if your variables look like
number_of_times_user_tried_to_login_unsuccessfully = 2
then you won’t be able to do much in each line of code you write.
However, if you go down the other extreme, you end up with names that are one, two, three characters long,
and those names won’t tell you anything.
One-char names
At one of the ends of the spectrum are one-character names. One-character names consist of a letter, either
uppercase or lowercase, or the underscore.
One-character names should generally be avoided, because they contain little to no information about what
they refer to. However, there are a couple of exceptions that make some sense in their given contexts.
Whenever you need to assign to a variable, even though you don’t need its value, you could use a sink, and
the one-character name _ is the recommended variable name for a sink. A sink is a variable that is assigned
to even though we don’t care about its value. An example of a sink shows up in unpacking, when you care
about the first and last elements of a list, but not about the middle:
>>> l = [42, 10, 20, 73]
>>> first, *_, last = l
>>> first
42
>>> last
73
In numerical contexts, n is also a common name for an integer and x for a real number (a float). This might
seem silly, but it is recommended that you do not use n for values that might not be whole integers. People
get so used to these conventions that breaking them might mean that understanding your code will take
much longer.
c and z are also occasionally used for complex numbers, but those are conventions that come from the world
of mathematics. In other words, these conventions are more likely to be followed by people that are close to
mathematics/mathematicians.
Still along the lines of conventions drawn from mathematics, i, j, and k, in this order, are often used for
(integer) indices. For example, you often see the following beginning of a for loop:
Abbreviations
Abbreviations need to be used sparingly. They might make sense if it is a widely recognise abbreviation…
But that, itself, is a dangerous game to play, because you cannot know what abbreviations the readers of
your code might know.
Something that might be safer is to use abbreviations that are relative to the domain knowledge of the code.
For example, if your code handles a network of logistics drones, at some point it might make sense to use
“eta” – which stands for “estimated time of arrival” – for a variable name that holds the estimated time of
arrival of a drone. But then again, try to reason about whether the readers of your code will be familiar with
the domain-specific lingo or not.
While this first guideline is fairly subjective, there is one type of abbreviation that is definitely a terrible idea,
and that’s non-standard abbreviations. If you can’t Google that abbreviation and get its meaning in the first
couple of results, then that’s not a standard abbreviation, at all.
For example, taking the long variable name from above and abbreviating it is a bad idea:
## Don't
number_of_times_user_tried_to_login_unsuccessfully = 2
Sentences
Rather than having names like
number_of_times_user_tried_to_login_unsuccessfully = 2
or
def compute_number_of_unsuccessful_login_attempts():
pass
consider shortening those names, and instead include a comment that gives further context, if needed. As
you will see, more often than not, you don’t even need the extra comment:
## Number of unsuccessful attempts made by the user:
unsuccessful_logins = 2
I mean, we are clearly working with a number, so we can just write:
## Unsuccessful attempts made by the user:
unsuccessful_logins = 2
We also know we are talking about unsuccessful attempts, because that’s in the variable name:
## Attempts made by the user:
unsuccessful_logins = 2
We can, either stop at this point, or remove the comment altogether if the user is the only entity that could
have made login attempts.
For functions, include the extra context in the docstring. This ensures that that helpful context is shown to
you/users when using calling your function. Nowadays, IDEs will show the docstring of the functions we are
calling in our code.
Picking a name
When picking the actual name for whatever it is that you need to name, remember to:
• pick a name that is consistent in style/wording with your surroundings;
• use always the same vocabulary and spelling;
## Good:
first_colour = "red"
last_colour = "blue"
## (or use `color` in both)
## Bad:
item.has_promotion = True
item.discount_percentage = 30
## Good:
item.has_discount = True # or item.is_discounted, for example.
item.discount_percentage = 30
• use a name that reflects what we are dealing with, instead of a generic name that reflects the type of
the data.
## Bad:
num = 18
string = "Hello, there."
## Good:
legal_age = 18
greeting = "Hello, there."
For variables, you can also consider a name that reflects a major invariant property of the entity you are
working with. “Invariant” means that it doesn’t change. This is important, otherwise you will have a name
that indicates something when the value itself is something else. I’ll show you an example of this by the end
of the Pydon’t.
Naming variables
Similarly, variables are better named with nouns, when they refer to entities.
For Boolean variables (also known as predicates), adjectives might be a good choice as well, in the sense
that the value of the Boolean reflects the presence or absence of that adjective.
For example:
Context is key
This has been mentioned heavily throughout this Pydon’t, but I want it to be highlighted even more, so there’s
a heading devoted to just this: context is key.
Remember that the context in which you are writing your code will impact a lot the names that you pick.
Contexts that matter include the domain(s) that your code belongs to (are you writing software to handle bank
transactions, to manage a network of logistics drones, or are you implementing a game?), the specific module
and functions you are in, and whether or not you are inside a statement like a loop, a list comprehension, or
a try: ... except: ... block.
As an example of how the domain you are working in can drastically affect your naming, consider the following
example, drawn from my experience with mathematics. Sometimes it is useful to be able to add polynomials,
and therefore you might want to implement that function:
def poly_addition(poly1, poly2):
pass
However, if you are in the context of a module that specialises in working with polynomials, then that function’s
signature could probably be boiled down to:
def add(p, q):
pass
(p and q are common names for polynomials in mathematics.)
See? Context is key.
Practical example
In my Pydon’ts talk and in the Pydon’t about refactoring, I showed a piece of code written by a beginner and
then proceeded to refactor it little by little. One of the steps was renaming things.
Here is said piece of code:
return "".join(empty)
This is what the code does:
>>> myfunc("abcdef")
'AbCdEf'
>>> myfunc("ABCDEF")
'AbCdEf'
>>> myfunc("A CDEF")
'A CdEf'
It alternates the casing of the characters of the argument.
As an exercise for you, try improving the names in the piece of code above before you keep reading.
Ok, have you had a go at improving the names?
Here are all of the names that show up in the function above:
• myfunc is the function name;
• a is the parameter of the function;
• empty is a list that grows with the new characters of the result; and
• i is the index into the argument string.
Here is a suggestion of improvement:
def alternate_casing(text):
letters = []
for idx in range(len(text)):
if idx % 2 == 0:
letters.append(text[idx].upper())
else:
letters.append(text[idx].lower())
return "".join(letters)
a is now text
Our function accepts a generic string as input. There is nothing particularly special or interesting about this
string, so perfectly good names include text and string.
I opted for text because it gives off the feeling that we will be working with human-readable strings.
i is now idx
i is a very typical name for an index and I don’t think there was anything wrong with it. I have a personal
preference for the 110% explicit idx for an index, and that is why I went with it.
Conclusion
Having gone through this Pydon’t, you might be thinking that most of the guidelines in here are fairly sub-
jective, and you are right!
I know it can be frustrating to not have objective rules to pick names for your variables, functions, etc… But
you know what they say! Naming things is the hardest problem you have to solve in programming.
Don’t fret, with experience you will become better and better at using good names in your code, and remem-
ber, Python reads almost like English, so the names you pick should help with that.
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“While naming can be hard, there are guidelines to help you make the best decisions possible.”
This Pydon’t showed you that:
• consistency with existing code is paramount in naming things;
References
• PEP 8 – Style Guide for Python Code, https://www.python.org/dev/peps/pep-0008 [last accessed
28-07-2021];
• Stack Overflow, “What’s an example use case for a Python classmethod?”, https://stackoverflow.com/
q/5738470/2828287 [last accessed 28-07-2021];
• testdriven.io, “Clean Code”, https://testdriven.io/blog/clean-code-python/#naming-conventions [last
accessed 10-08-2021]
37
Introduction
In this Pydon’t we will go over the chaining of comparison operators:
• how they work;
• useful usages; and
• weird cases to avoid.
Pitfalls
Even though this feature looks very sensible, there are a couple of pitfalls you have to look out for.
Non-transitive operators
We saw above that we can use a == b == c to check if a, b and c are all the same. How would you check if
they are all different?
If you thought about a != b != c, then you just fell into the first pitfall!
Look at this code:
>>> a = c = 1
>>> b = 2
>>> if a != b != c:
... print("a, b, and c all different:", a, b, c)
a, b, and c all different: 1 2 1
The problem here is that a != b != c is a != b and b != c, which checks that b is different from a and
from c, but says nothing about how a and c relate.
From the mathematical point of view, != isn’t transitive, i.e., knowing how a relates to b and knowing how
b relates to c doesn’t tell you how a relates to c. As for a transitive example, you can take the == equality
operator. If a == b and b == c then it is also true that a == c.
Ugly chains
This feature looks really natural, but some particular cases aren’t so great. This is a fairly subjective matter,
but I personally don’t love chains where the operators aren’t “aligned”, so chains like
• a == b == c
• a < b <= c
• a <= b < c
look really good, but in my opinion chains like
• a < b > c
• a <= b > c
• a < b >= c
don’t look that good. One can argue, for example, that a < b > c reads nicely as “check if b is larger than
both a and c”, but you could also write max(a, c) < b or b > max(a, c).
Now there’s some other chains that are just confusing:
• a < b is True
• a == b in l
• a in l is True
Examples in code
Inequality chain
Having a simple utility function that ensures that a given value is between two bounds becomes really simple,
e.g.
def ensure_within(value, bounds):
return bounds[0] <= value <= bounds[1]
or if you want to be a little bit more explicit, while also ensuring bounds is a vector with exactly two items,
you can also write
def ensure_within(value, bounds):
m, M = bounds
return m <= value <= M
Equality chain
Straight from Python’s enum module, we can find a helper function (that is not exposed to the user), that
reads as follows:
def _is_dunder(name):
"""Returns True if a __dunder__ name, False otherwise."""
return (len(name) > 4 and
name[:2] == name[-2:] == '__' and
Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“Chaining comparison operators feels so natural, you don’t even notice it is a feature. However,
some chains might throw you off if you overlook them.”
This Pydon’t showed you that:
• you can chain comparisons, and do so arbitrarily many times;
• chains with expressions that have side-effects or with non-deterministic outputs are not equivalent to
the extended version; and
• some chains using is or in can look really misleading.
References
• Python 3 Documentation, The Python Language Reference https://docs.python.org/3/reference/expr
essions.html#comparisons;
• Python 3 Documentation, The Python Standard Library, enum, https://docs.python.org/3/library/enum.h
tml;
• Reddit, comment on “If they did make a python 4, what changes from python 3 would you like to see?”,
https://www.reddit.com/r/Python/comments/ltaf3y/if_they_did_make_a_python_4_what_changes_from/
gowuau5?utm_source=share&utm_medium=web2x&context=3.
Online references last consulted on the 1st of March of 2021.
43
3
>>> print(b = 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'b' is an invalid keyword argument for print()
>>> print(b := 3)
3
>>> b
3
As shown in PEP 572, a good usage of assignment expressions can help write better code: code that is
clearer and/or runs faster.
Assignment expressions should be avoided when they make the code too convoluted, even if it saves you a
couple of lines of code. You don’t want to disrespect the Zen of Python, and the Zen of Python recommends
writing readable code.
The snippet of code below features what is, in my opinion, a fairly unreadable usage of an assignment
expression:
import sys
i = input()
if i[0] == "q" or i == "exit":
sys.exit()
The second alternative (without :=) is much easier to read than the first one, even though using := saved
one line of code.
However, good uses of assignment expressions can
• make your code faster,
• make it more readable/expressive, and
• make your code shorter.
Examples in code
Here are a couple of examples of good usages of assignment expressions.
Conclusion
Assignment expressions allow the binding of a name to a part of an expression, which can be used to great
benefit in clarifying the flow of some programs or saving time on expensive computations, for example. Bad
usages of assignment expressions, however, can make code very unreadable and is therefore crucial to judge
whether or not an assignment expression is a good fit for a particular task.
References
• Python 3 Documentation, What’s New in Python, What’s new in Python 3.8 - Assignment expressions,
https://docs.python.org/3/whatsnew/3.8.html#assignment-expressions.
• PEP 572 – Assignment Expressions, https://www.python.org/dev/peps/pep-0572.
• Real Python, “Assignment Expressions: The Walrus Operator”, https://realpython.com/lessons/assig
nment-expressions/.
Online references consulted on the 26th of January of 2021.
Remarks
Now a couple of remarks about the functioning of Truthy and Falsy values.
Examples in code
Now I will show you some examples of places where using the Truthy and Falsy values of Python objects
allows you to write more Pythonic code.
2D point
Let us implement a simple class to represent points in a 2D plane, which could be an image, a plot or
something else. Retrieving what we already had in the article about __str__ and __repr__, we can add a
def __str__(self):
"""Provide a good-looking representation of the object."""
return f"({self.x}, {self.y})"
def __repr__(self):
"""Provide an unambiguous way of rebuilding this object."""
return f"Point2D({repr(self.x)}, {repr(self.y)})"
def __bool__(self):
"""The origin is Falsy and all other points are Truthy."""
return self.x or self.y
Processing data
It is also very common to use Truthy and Falsy values to measure if there is still data to be processed.
For example, when I talked about the walrus operator :=, we saw a while loop vaguely similar to this one:
input_lines = []
while (s := input()):
input_lines.append(s)
## No more lines to read.
print(len(input_lines))
This while loop essentially reads input lines while there are lines to be read. As soon as the user inputs an
empty line "", the loop stops and we print the number of lines we read:
>>> input_lines = []
>>> while (s := input()):
... input_lines.append(s)
...
Line 1
Line 2
>>> print(len(input_lines))
2
Another common pattern is when you have a list that contains some data that you have to process, and such
that the list itself gets modified as you process the data.
Consider the following example:
import pathlib
def print_file_sizes(dir):
"""Print file sizes in a directory, recurse into subdirs."""
paths_to_process = [dir]
while paths_to_process:
path, *paths_to_process = paths_to_process
path_obj = pathlib.Path(path)
if path_obj.is_file():
print(path, path_obj.stat().st_size)
else:
paths_to_process += path_obj.glob("*")
Conclusion
• Python’s Truthy and Falsy values allow you to rewrite common conditions in a way that is more readable
and, therefore, Pythonic.
• You can implement your own Truthy and Falsy values in custom classes by implementing the __bool__
dunder method.
• You should also be careful when checking if a given variable is None or not, and avoid using the Falsy
value of None in those particular cases.
References
• Python 3 Documentation, The Python Language Reference, Data model, bool, https://docs.python.org/
3/reference/datamodel.html#object.__bool__.
• Python 3 Documentation, The Python Standard Library, Truth Value Testing, https://docs.python.org/3/
library/stdtypes.html#truth-value-testing.
• Python 3 Documentation, The Python Standard Library, Built-in Functions, bool, https://docs.python.or
g/3/library/functions.html#bool.
• PEP 8 – Style Guide for Python Code, https://www.python.org/dev/peps/pep-0008/.
• Python 3 Documentation, The Python Standard Library, File and Directory Access, pathlib, https:
//docs.python.org/3/library/pathlib.html.
• Stack Overflow, Listing of all files in directory?, https://stackoverflow.com/a/40216619/2828287.
• Stack Overflow, How can I check file size in Python?, https://stackoverflow.com/a/2104107/2828287.
• freeCodeCamp, Truthy and Falsy Values in Python: A Detailed Introduction, https://www.freecodecamp
.org/news/truthy-and-falsy-values-in-python/.
Online references last consulted on the 9th of February of 2021.
Introduction
In this Pydon’t we will go over deep unpacking: - what it is; - how it works; - how to use it to improve code
readability; and - how to use it to help debug your code.
Learning about deep unpacking will be very helpful in order to pave the road for structural matching, a
feature to be introduced in Python 3.10.
57
Assignments
Before showing you how deep unpacking works, let’s have a quick look at two other nice features about
Python’s assignments.
Multiple assignment
In Python, multiple assignment is what allows you to write things like
>>> x = 3
>>> y = "hey"
>>> x, y = y, x # Multiple assignment to swap variables.
>>> x
'hey'
>>> y
3
or
>>> rgb_values = (45, 124, 183)
>>> # Multiple assignment unpacks the tuple.
>>> r, g, b = rgb_values
>>> g
124
With multiple assignment you can assign, well, multiple variables at the same time, provided the right-hand
side has as many items as the left-hand side expects.
Starred assignment
Starred assignment, that I covered in depth in this Pydon’t, allows you to write things like
>>> l = [0, 1, 2, 3, 4]
>>> head, *body = l
>>> print(head)
0
>>> print(body)
[1, 2, 3, 4]
>>> *body, tail = l
>>> print(tail)
4
>>> head, *body, tail = l
>>> print(body)
[1, 2, 3]
With starred assignment you can tell Python that you are not sure how many items the right-hand side will
have, but all of them can be stored in a single place.
In loops
Deep unpacking can also be used in the implicit assignments of for loops, it doesn’t have to be in explicit
assignments with an equals sign! The examples below will show you that.
Deep unpacking, when used well, can improve the readability of your code – by removing indexing clutter
and by making the intent more explicit – and can help you test your code for some errors and bugs.
Nothing better than showing you some code, so you can see for yourself.
Examples in code
Increasing expressiveness
Given the RGB values of a colour, you can apply a basic formula to convert it to greyscale, which weighs the
R, G, and B components differently. We could write a function that takes the colour information like we have
been using, and then computes its greyscale value:
def greyscale(colour_info):
return 0.2126*colour_info[1][0] + 0.7152*colour_info[1][1] + \
0.0722*colour_info[1][2]
(This formula we are using,
[ 0.2126R + 0.7152G + 0.0722B ~ , ]
is usually the first step of a slightly more involved formula, but it will be good enough for our purposes.)
Now you can use your function:
colour = ("AliceBlue", (240, 248, 255))
print(greyscale(colour)) # prints 246.8046
But I think we can all agree that the function definition could surely be improved. The long formula with the
additions and multiplications doesn’t look very nice. In fact, if we use deep unpacking to extract the r, g,
and b values, the formula will be spelled out pretty much like if it were the original mathematical formula I
showed:
def greyscale(colour_info):
name, (r, g, b) = colour_info
Catching bugs
I said earlier that deep unpacking can also help you find bugs in your code. It is not hard to believe that the
colours list of the previous example could have come from some other function, for example a function that
scrapes the webpage I have been checking, and creates those tuples with colour information.
Let us pretend for a second that my web scraper isn’t working 100% well yet, and so it ended up producing
the following list, where it read the RGB values of two colours into the same one:
colours = [
("AliceBlue", (240, 248, 255, 127, 255, 212)),
("DarkCyan", (0, 139, 139)),
]
If we were to apply the original greyscale function to colours[0], the function would just work:
def greyscale(colour_info):
return 0.2126*colour_info[1][0] + 0.7152*colour_info[1][1] + \
0.0722*colour_info[1][2]
colours = [
print(greyscale(colours[0])) # 246.8046
However, if you were to use the function that uses deep unpacking, then this would happen:
def greyscale(colour_info):
name, (r, g, b) = colour_info
return 0.2126*r + 0.7152*g + 0.0722*b
colours = [
("AliceBlue", (240, 248, 255, 127, 255, 212)),
("DarkCyan", (0, 139, 139)),
]
Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“Use deep unpacking to improve readability and to keep the shape of your variables in check.”
This Pydon’t showed you that:
• Python’s assignments have plenty of interesting features;
• deep unpacking can prevent cluttering your code with hardcoded indexing;
• deep unpacking improves the readability of your code; and
• some bugs related to iterable shape can be caught if using deep unpacking.
References
• PEP 634 – Structural Pattern Matching: Specification, https://www.python.org/dev/peps/pep-0634/;
• PEP 3113 – Removal of Tuple Parameter Unpacking, https://www.python.org/dev/peps/pep-3113/;
• Multiple assignment and tuple unpacking improve Python code readability, https://treyhunner.com/2
018/03/tuple-unpacking-improves-python-code-readability/#Using_a_list-like_syntax;
• Unpacking Nested Data Structures in Python, https://dbader.org/blog/python-nested-unpacking;
• W3Schools, HTML Color Names, https://www.w3schools.com/colors/colors_names.asp;
64
Starred Assignment
It is fairly common to have a list or another iterable that you want to split in the first element and then the
rest. You can do this by using slicing in Python, but the most explicit way is with starred assignments.
This feature was introduced in PEP 3132 – Extended Iterable Unpacking and allows for the following:
>>> l = [1, 2, 3, 4, 5]
>>> head, *tail = l
>>> head
1
>>> tail
[2, 3, 4, 5]
This starred assignment is done by placing one * to the left of a variable name in a multiple assignment,
and by having any iterable on the right of the assignment. All variable names get a single element and the
variable name with the “star” (the asterisk *) gets all other elements as a list:
>>> string = "Hello!"
>>> *start, last = string
>>> start
['H', 'e', 'l', 'l', 'o']
>>> last
'!'
You can have more than two variable names on the left, but only one asterisk:
>>> a, b, *c, d = range(5) # any iterable works
>>> a
0
>>> b
1
>>> c
[2, 3]
>>> d
4
When you use the starred assignment, the starred name might get an empty list,
>>> a, *b = [1]
>>> a
1
>>> b
[]
and an error is issued if there are not enough items to assign to the names that are not starred:
>>> a, *b = []
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected at least 1, got 0)
if not list_:
raise TypeError("Cannot reduce empty list.")
value = list_[0]
list_ = list_[1:]
while list_:
value = function(value, list_[0])
list_ = list_[1:]
return value
And here is an equivalent implementation using starred assignment:
def reduce(function, list_):
"""Reduce the elements of the list by the binary function."""
if not list_:
raise TypeError("Cannot reduce empty list.")
value, *list_ = list_
while list_:
val, *list_ = list_
value = function(value, val)
return value
The usage of the starred assignment here makes it abundantly clear that we wish to unpack the list into an
item to be used now and the rest to be used later.
Another similar example, but with the starred name in the beginning, follows.
weight = 2
acc = 0
for digit in reversed(digits[:-1]):
value = digit * weight
acc += (value // 10) + (value % 10)
weight = 3 - weight # 2 -> 1 and 1 -> 2
return (9 * acc % 10) == digits[-1]
References
• PEP 3132 – Extended Iterable Unpacking, https://www.python.org/dev/peps/pep-3132/
• Python 3.9.1 Documentation, The Python Standard Library, Functional Programming Modules,
functools, https://docs.python.org/3/library/functools.html#functools.reduce [consulted on the 12th
of January of 2021].
69
print("Type a positive integer (defaults to 1):")
s = input(" >> ")
if s.isnumeric():
n = int(s)
else:
n = 1
(In the code above, we use the method str.isnumeric to check if the string is a valid integer. Try running
print(str.isnumeric.__doc__) in your Python REPL.)
With EAFP, you first try to perform whatever operation it is you want to do, and then use a try block to
capture an eventual exception that your operation might throw in case it is not successful. In our example,
this means we simply try to convert s into an integer and in case a ValueError exception is raised, we set
the default value:
print("Type a positive integer (defaults to 1):")
s = input(" >> ")
try:
n = int(s)
except ValueError:
n = 1
We use except ValueError because a ValueError is the exception that is raised if you try to convert to
integer a string that doesn’t contain an integer:
>>> int("345")
345
>>> int("3.4")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '3.4'
>>> int("asdf")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'asdf'
Avoid redundancy
Sometimes, coding with EAFP in mind allows you to avoid redundancy in your code. Imagine you have a
dictionary from which you want to extract a value associated with a key, but that key might not exist.
With LBYL, you would do something like:
Conclusion
EAFP code is a very good alternative to LBYL code, even being superior in various alternatives, like the ones
I mentioned above. When writing code, try to weigh the different pros and cons of the several approaches
you can take, and don’t forget to consider writing EAFP code!
EAFP is not the absolute best way to go in every single situation, but EAFP code can be very readable and
performant!
References
• PEP 463 – Exception-catching expressions, https://www.python.org/dev/peps/pep-0463/
• Python 3 Documentation, The Python Standard Library, Debugging and Profiling, timeit, https://docs
.python.org/3/library/timeit.html.
• Python 3 Documentation, The Python Tutorial, Errors and Exceptions, https://docs.python.org/3/tutori
al/errors.html.
• Microsoft Devblogs, Idiomatic Python: EAFP versus LBYL, https://devblogs.microsoft.com/python/idio
matic-python-eafp-versus-lbyl/.
• Stack Overflow, “What is the EAFP principle in Python?”, https://stackoverflow.com/questions/113608
58/what-is-the-eafp-principle-in-python.
• Stack Overflow, “Ask forgiveness not permission - explain”, https://stackoverflow.com/questions/1136
0858/what-is-the-eafp-principle-in-python.
Online references consulted on the 19th of January of 2021.
Introduction
One of the things I appreciate most about Python, when compared to other programming languages, is its
for loops. Python allows you to write very expressive loops, and part of that is because of the built-in zip
function.
75
In this article you will
• see what zip does;
• get to know a new feature of zip that is coming in Python 3.10;
• learn how to use zip to create dictionaries; and
• see some nice usage examples of zip.
Zip is lazy
One thing to keep in mind is that zip doesn’t create the tuples immediately. zip is lazy, and that means it
will only compute the tuples when you ask for them, for example when you iterate over them in a for loop
(like in the examples above) or when you convert the zip object into a list:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans", "Rivers"]
>>> z = zip(firsts, lasts)
>>> z
Three is a crowd
We have seen zip with two arguments, but zip can take an arbitrary number of iterators and will produce a
tuple of the appropriate size:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> middles = ["Z.", "A.", "G."]
>>> lasts = ["Smith", "Doe", "Evans"]
>>> for z in zip(firsts, middles, lasts):
... print(z)
...
('Anna', 'Z.', 'Smith')
('Bob', 'A.', 'Doe')
('Charles', 'G.', 'Evans')
Mismatched lengths
zip will always return a tuple with as many elements as the arguments it received, so what happens if one
of the iterators is shorter than the others?
If zip’s arguments have unequal lengths, then zip will keep going until it exhausts one of the iterators. As
soon as one iterator ends, zip stops producing tuples:
>>> firsts = ["Anna", "Bob", "Charles"]
>>> lasts = ["Smith", "Doe", "Evans", "Rivers"]
>>> for z in zip(firsts, lasts):
... print(z)
Examples in code
Now you will see some usages of zip in actual Python code.
Matching paths
If you are not aware of it, then you might be interested in knowing that Python has a module named pathlib
that provides facilities to deal with filesystem paths.
When you create a path, you can then check if it matches a given pattern:
>>> from pathlib import PurePath
>>> PurePath('a/b.py').match('*.py')
True
>>> PurePath('/a/b/c.py').match('b/*.py')
True
>>> PurePath('/a/b/c.py').match('a/*.py')
False
If you take a look at this match function, you find this:
class PurePath(object):
# ...
writer.writeheader()
writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
The fieldnames variable will establish the header of the CSV file and is then used by the writerow method
to know the order in which the values of the dictionary should be written in the file.
The writeheader function is the function that writes the header of the CSV file, and here is what it looks like:
class DictWriter:
# ...
def writeheader(self):
header = dict(zip(self.fieldnames, self.fieldnames))
return self.writerow(header)
Basically, what this function is doing is using zip to transform the header names into a dictionary where the
keys and the values are the same, pretending that the header is just a regular data row:
>>> fieldnames = ['first_name', 'last_name']
>>> dict(zip(fieldnames, fieldnames))
{'first_name': 'first_name', 'last_name': 'last_name'}
Therefore, the writeheader function just needs to create this dictionary and can then defer the actual writing
to the writerow function.
Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“zip is your friend whenever you need to traverse two or more iterables at the same time.”
This Pydon’t showed you that:
• zip can be used to traverse several iterables at the same time;
References
• Python 3 Documentation, The Python Standard Library, zip, docs.python.org/3/library/functions.html#zip
[last accessed 30-03-2021];
• Python 3.10 Documentation, The Python Standard Library, zip, docs.python.org/3.10/library/functions.html#zip
[last accessed 30-03-2021];
• Python 3 Documentation, The Python Standard Library, csv, docs.python.org/3/library/csv.html [last
accessed 30-03-2021].
• Python 3 Documentation, The Python Standard Library, pathlib, docs.python.org/3/library/pathlib.html
[last accessed 30-03-2021].
Introduction
Following up on last week’s Pydon’t about zip, today we are talking about enumerate.
One of the things I appreciate most about Python, when compared to other programming languages, is its
for loops. Python allows you to write very expressive loops, and some of that expressiveness comes from
the built-in enumerate function.
In this article you will
• see what enumerate does;
84
• take a look at its underrated optional start argument;
• learn a couple of neat use cases for enumerate;
• see some nice examples of code using enumerate.
Deep unpacking
Things can get even more interesting when you use enumerate, for example, on a zip:
>>> # Page where each chapter starts and the final page of the book.
>>> pages = [5, 17, 31, 50]
>>> for i, (start, end) in enumerate(zip(pages, pages[1:]), start=1):
... print(f"'{i}: {end-start} pages long.'")
...
'1: 12 pages long.'
'2: 14 pages long.'
'3: 19 pages long.'
(Here I explicitly named the start= argument in the enumerate so that it was visually easier to separate it
from the argument to zip.)
This code snippet takes a list of pages where chapters of a book start and prints the length of each chapter.
Notice how enumerate returns tuples with indices and values, but those values are extracted from a zip,
which itself returns tuples:
>>> # Page where each chapter starts and the final page of the book.
>>> pages = [5, 17, 31, 50]
>>> for tup in enumerate(zip(pages, pages[1:]), start=1):
... print(tup)
...
(1, (5, 17))
(2, (17, 31))
(3, (31, 50))
What we do is use deep unpacking to access all these values directly:
>>> # Page where each chapter starts and the final page of the book.
>>> pages = [5, 17, 31, 50]
>>> for tup in enumerate(zip(pages, pages[1:]), start=1):
... i, (start, end) = tup
... print(f"'{i}: {end-start} pages long.'")
...
'1: 12 pages long.'
Examples in code
Now you will see some usages of enumerate in real Python code.
Vanilla enumerate
I took a look at the Python Standard Library and by and large the most common usage of enumerate is just
a vanilla enumerate(iter) to access iterable values and indices at the same time. Let me share a textbook
example with you:
The doctest module allows you to write simple tests for your code inside the docstrings for your functions,
classes, etc. The way you write these tests is in the form of an interactive session in the REPL. doctest then
locates those “interactive sessions” in your docstrings and plays them to see if the actual output of the code
matches what your docstring showed.
If you open your Python REPL, you will see that it starts with the prompt >>> which has a blank space after
the triple >. You cannot delete that blank space, it is part of the prompt. When parsing a docstring to extract
the actual tests, the parser performs a check to see if the prompts have that leading blank space or not, and
here is the code that does it:
## from Lib\doctest.py in Python 3.9
class DocTestParser:
# ...
>>> sum_nats(1)
1
>>> sum_nats(10)
55
>>>sum_nats(100)
5050
"""
return int(n*(n+1)/2)
if __name__ == "__main__":
import doctest
doctest.testmod()
Notice how I intentionally wrote the third example without a space between >>> and sum_nats(100). Running
this script should throw a ValueError at your face, that should go away when you put a blank space there.
If I tell the Calendar class to start counting weeks on Sundays (day 6), like my desktop calendar does, here
is what itermonthdays produces:
>>> for d in c.Calendar(6).itermonthdays(2021, 4):
... print(d)
...
0
0
0
0
1
2
Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“enumerate is your best friend if you need to traverse an iterator to deal with its data and also
need access to information about its index.”
This Pydon’t showed you that:
• enumerate gives you access to an iterable’s elements and indices at the same time;
• enumerate by itself returns a lazy enumerate object that must be then iterated or converted explicitly
to a list (or something else that suits your needs) if you want its values;
• enumerate takes a second argument to set an offset for the indexing;
– and, in particular, that argument can be a negative integer;
• the result of enumerate can be fed directly to dict to create a dictionary whose keys are the indices;
• using enumerate we get a nice idiom to find the indices of an iterable that point to the elements that
satisfy a given condition; and
• coupling zip, enumerate, and deep unpacking allows you to loop over several iterables elegantly.
Examples in code
datetime
Python’s datetime module supplies classes for manipulating dates and times. A simple date could be created
like so:
>>> import datetime
>>> date = datetime.datetime(2021, 2, 2)
2D point
An example custom usage of the __str__ and __repr__ dunder methods could come into play if you were
to implement a simple class that represents 2D points, for example because you have to deal with images
or a game or maps, or whatever your use case is.
Ignoring all other methods you would certainly implement, your class could look like this:
class Point2D:
"""A class to represent points in a 2D space."""
def __str__(self):
"""Provide a good-looking representation of the object."""
return f"({self.x}, {self.y})"
def __repr__(self):
"""Provide an unambiguous way of rebuilding this object."""
return f"Point2D({repr(self.x)}, {repr(self.y)})"
References
• Python 3 Documentation, The Python Language Reference, Data model, repr and str, https://docs.pyt
hon.org/3/reference/datamodel.html#object.__repr__.
• Python 3 Documentation, The Python Standard Library, Built-in Functions, https://docs.python.org/3/li
brary/functions.html.
• Python 3 Documentation, The Python Standard Library, Built-in Types, str, https://docs.python.org/3/li
brary/stdtypes.html#str.
• PEP 3140 – str(container) should call str(item), not repr(item), https://www.python.org/dev/peps/pep-
3140/.
• Stack Overflow, “Purpose of Python’s repr”, https://stackoverflow.com/questions/1984162/purpose-
of-pythons-repr.
• dbader.org, “Python String Conversion 101: Why Every Class Needs a “repr””, https://dbader.org/blog/
python-repr-vs-str.
Online references last consulted on the 2nd of February of 2021.
This book is a WIP. Check online to get updates for free. 100
Structural pattern matching tutorial
101
Introduction
Structural pattern matching is coming to Python, and while it may look like a plain switch statement like many
other languages have, Python’s match statement was not introduced to serve as a simple switch statement.
PEPs 634, 635, and 636 have plenty of information on what structural pattern matching is bringing to Python,
how to use it, the rationale for adding it to Python, etc. In this article I will try to focus on using this new
feature to write beautiful code.
At the time of writing, Python 3.10 is still a pre-release, so you have to look in the right place if you want to
download Python 3.10 and play with it.
This book is a WIP. Check online to get updates for free. 102
def factorial(n):
if n == 0 or n == 1:
return 1
else:
return n * factorial(n-1)
factorial(5) # 120
Instead of using an if statement, we could use a match:
def factorial(n):
match n:
case 0 | 1:
return 1
case _:
return n * factorial(n - 1)
factorial(5)
Notice a couple of things here: we start our match statement by typing match n, meaning we will want to do
different things depending on what n is. Then, we have case statements that can be thought of the different
possible scenarios we want to handle. Each case must be followed by a pattern that we will try to match n
against.
Patterns can also contain alternatives, denoted by the | in case 0 | 1, which matches if n is either 0 or 1.
The second pattern, case _:, is the go-to way of matching anything (when you don’t care about what you are
matching), so it is acting more or less like the else of the first definition.
match colour:
case (r, g, b):
name = ""
a = 0
case (r, g, b, a):
name = ""
case (name, (r, g, b)):
a = 0
case (name, (r, g, b, a)):
pass
case _:
raise ValueError("Unknown colour info.")
return (name, (r, g, b, a))
This book is a WIP. Check online to get updates for free. 103
# Prints ('', (240, 248, 255, 0))
print(normalise_colour_info((240, 248, 255)))
# Prints ('', (240, 248, 255, 0))
print(normalise_colour_info((240, 248, 255, 0)))
# Prints ('AliceBlue', (240, 248, 255, 0))
print(normalise_colour_info(("AliceBlue", (240, 248, 255))))
# Prints ('AliceBlue', (240, 248, 255, 0.3))
print(normalise_colour_info(("AliceBlue", (240, 248, 255, 0.3))))
Notice here that each case contains an expression like the left-hand side of an unpacking assignment, and
when the structure of colour matches the structure that the case exhibits, then the names get assigned to
the variable names in the case.
This is a great improvement over the equivalent code with if statements:
def normalise_colour_info(colour):
"""Normalise colour info to (name, (r, g, b, alpha))."""
if len(colour) == 3:
r, g, b = colour
name = ""
a = 0
elif len(colour) == 4:
r, g, b, a = colour
name = ""
elif len(colour) != 2:
raise ValueError("Unknown colour info.")
else:
name, values = colour
if not isinstance(values, (list, tuple)) or len(values) not in [3, 4]:
raise ValueError("Unknown colour info.")
elif len(values) == 3:
r, g, b = values
a = 0
else:
r, g, b, a = values
return (name, (r, g, b, a))
I tried writing a decent, equivalent piece of code to the one using structural pattern matching, but this doesn’t
look that good. Someone else has suggested, in the comments, another alternative that also doesn’t use
match. That suggestion looks better than mine, but is much more complex and larger than the alternative
with match.
The match version becomes even better when we add type validation to it, by asking for the specific values
to actually match Python’s built-in types:
This book is a WIP. Check online to get updates for free. 104
def normalise_colour_info(colour):
"""Normalise colour info to (name, (r, g, b, alpha))."""
match colour:
case (int(r), int(g), int(b)):
name = ""
a = 0
case (int(r), int(g), int(b), int(a)):
name = ""
case (str(name), (int(r), int(g), int(b))):
a = 0
case (str(name), (int(r), int(g), int(b), int(a))):
pass
case _:
raise ValueError("Unknown colour info.")
return (name, (r, g, b, a)))
def __str__(self):
"""Provide a good-looking representation of the object."""
return f"({self.x}, {self.y})"
def __repr__(self):
"""Provide an unambiguous way of rebuilding this object."""
return f"Point2D({repr(self.x)}, {repr(self.y)})"
Imagine we now want to write a little function that takes a Point2D and writes a little description of where
the point lies. We can use pattern matching to capture the values of the x and y attributes and, what is more,
This book is a WIP. Check online to get updates for free. 105
we can use short if statements to help narrow down the type of matches we want to succeed!
Take a look at the following:
def describe_point(point):
"""Write a human-readable description of the point position."""
match point:
case Point2D(x=0, y=0):
desc = "at the origin"
case Point2D(x=0, y=y):
desc = f"in the vertical axis, at y = {y}"
case Point2D(x=x, y=0):
desc = f"in the horizontal axis, at x = {x}"
case Point2D(x=x, y=y) if x == y:
desc = f"along the x = y line, with x = y = {x}"
case Point2D(x=x, y=y) if x == -y:
desc = f"along the x = -y line, with x = {x} and y = {y}"
case Point2D(x=x, y=y):
desc = f"at {point}"
__match_args__
Now, I don’t know if you noticed, but didn’t all the x= and y= in the code snippet above annoy you? Every
time I wrote a new pattern for a Point2D instance, I had to specify what argument was x and what was y. For
classes where this order is not arbitrary, we can use __match_args__ to tell Python how we would like match
to match the attributes of our object.
Here is a shorter version of the example above, making use of __match_args__ to let Python know the order
in which arguments to Point2D should match:
class Point2D:
"""A class to represent points in a 2D space."""
This book is a WIP. Check online to get updates for free. 106
self.x = x
self.y = y
def describe_point(point):
"""Write a human-readable description of the point position."""
match point:
case Point2D(0, 0):
desc = "at the origin"
case Point2D(0, y):
desc = f"in the vertical axis, at y = {y}"
case Point2D(x, 0):
desc = f"in the horizontal axis, at x = {x}"
case Point2D(x, y):
desc = f"at {point}"
Wildcards
Another cool thing you can do when matching things is to use wildcards.
Asterisk *
Much like you can do things like
>>> head, *body, tail = range(10)
>>> print(head, body, tail)
0 [1, 2, 3, 4, 5, 6, 7, 8] 9
where the *body tells Python to put in body whatever does not go into head or tail, you can use * and **
wildcards. You can use * with lists and tuples to match the remaining of it:
def rule_substitution(seq):
new_seq = []
while seq:
match seq:
case [x, y, z, *tail] if x == y == z:
new_seq.extend(["3", x])
case [x, y, *tail] if x == y:
This book is a WIP. Check online to get updates for free. 107
new_seq.extend(["2", x])
case [x, *tail]:
new_seq.extend(["1", x])
seq = tail
return new_seq
seq = ["1"]
print(seq[0])
for _ in range(10):
seq = rule_substitution(seq)
print("".join(seq))
"""
Prints:
1
11
21
1211
111221
312211
13112221
1113213211
31131211131221
13211311123113112211
11131221133112132113212221
"""
This builds the sequence I showed above, where each number is derived from the previous one by looking at
its digits and describing what you are looking at. For example, when you find three equal digits in a row, like
"222", you rewrite that as "32" because you are seeing three twos. With the match statement this becomes
much cleaner. In the case statements above, the *tail part of the pattern matches the remainder of the
sequence, as we are only using x, y, and z to match in the beginning of the sequence.
This book is a WIP. Check online to get updates for free. 108
unlike matching with lists or tuples, where the match has to be perfect if no wildcard is mentioned.
Double asterisk **
However, if you want to know what the original dictionary had that was not specified in the match, you can
use a ** wildcard:
d = {0: "oi", 1: "uno"}
match d:
case {0: "oi", **remainder}:
print(remainder)
## prints {1: 'uno'}
Finally, you can use this to your advantage if you want to match a dictionary that contains only what you
specified:
d = {0: "oi", 1: "uno"}
match d:
case {0: "oi", **remainder} if not remainder:
print("Single key in the dictionary")
case {0: "oi"}:
print("Has key 0 and extra stuff.")
## Has key 0 and extra stuff.
You can also use variables to match the values of given keys:
d = {0: "oi", 1: "uno"}
match d:
case {0: zero_val, 1: one_val}:
print(f"0 mapped to {zero_val} and 1 to {one_val}")
## 0 mapped to oi and 1 to uno
Naming sub-patterns
Sometimes you may want to match against a more structured pattern, but then give a name to a part of the
pattern, or to the whole thing, so that you have a way to refer back to it. This may happen especially when
your pattern has alternatives, which you add with |:
def go(direction):
match direction:
case "North" | "East" | "South" | "West":
return "Alright, I'm going!"
case _:
return "I can't go that way..."
This book is a WIP. Check online to get updates for free. 109
def act(command):
match command.split():
case "Cook", "breakfast":
return "I love breakfast."
case "Cook", *wtv:
return "Cooking..."
case "Go", "North" | "East" | "South" | "West":
return "Alright, I'm going!"
case "Go", *wtv:
return "I can't go that way..."
case _:
return "I can't do that..."
This book is a WIP. Check online to get updates for free. 110
from the right.
You can write a little match to deal with this:
import ast
def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
case ast.BinOp(lhs, op, rhs):
match op:
case ast.Add():
sop = "+"
case ast.Sub():
sop = "-"
case ast.Mult():
sop = "*"
case ast.Div():
sop = "/"
case _:
raise NotImplementedError()
return f"{sop} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()
def op_to_str(op):
ops = {
ast.Add: "+",
ast.Sub: "-",
This book is a WIP. Check online to get updates for free. 111
ast.Mult: "*",
ast.Div: "/",
}
return ops.get(op.__class__, None)
def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
case ast.BinOp(lhs, op, rhs):
sop = op_to_str(op)
if sop is None:
raise NotImplementedError()
return f"{sop} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()
Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“Structural pattern matching introduces a feature that can simplify and increase the readability
of Python code in many cases, but it will not be the go-to solution in every single situation.”
This Pydon’t showed you that:
• structural pattern matching with the match statement greatly extends the power of the already-existing
starred assignment and structural assignment features;
• structural pattern matching can match literal values and arbitrary patterns
• patterns can include additional conditions with if statements
• patterns can include wildcards with * and **
• match statements are very powerful when dealing with the structure of class instances
• __match_args__ allows to define a default order for arguments to be matched in when a custom class
is used in a case
• built-in Python classes can be used in case statements to validate types
References
• PEP 622 – Structural Pattern Matching, https://www.python.org/dev/peps/pep-0622/;
This book is a WIP. Check online to get updates for free. 112
• PEP 634 – Structural Pattern Matching: Specification, https://www.python.org/dev/peps/pep-0634/;
• PEP 635 – Structural Pattern Matching: Motivation and Rationale, https://www.python.org/dev/peps/
pep-0635/;
• PEP 636 – Structural Pattern Matching: Tutorial, https://www.python.org/dev/peps/pep-0636/;
• Dynamic Pattern Matching with Python, https://gvanrossum.github.io/docs/PyPatternMatching.pdf;
• Python 3.10 Pattern Matching in Action, YouTube video by “Big Python”, https://www.youtube.com/wa
tch?v=SYTVSeTgL3s.
This book is a WIP. Check online to get updates for free. 113
Structural pattern matching
anti-patterns
114
patterns.)
Introduction
Structural pattern matching is coming to Python, and while it may look like a plain switch statement like many
other languages have, Python’s match statement was not introduced to serve as a simple switch statement.
In this article I explored plenty of use cases for the new match statement, and in this blog post I will try to
explore some use cases for which a match is not the answer. This article will assume you know how structural
pattern matching works in Python, so if you are unsure how that works feel free to read my “Pattern matching
tutorial for Pythonic code”.
At the time of writing, Python 3.10 is still a pre-release, so you have to look in the right place if you want to
download Python 3.10 and play with it.
This book is a WIP. Check online to get updates for free. 115
Which gives the following two example outputs:
>>> collatz_path(8)
[8, 4, 2, 1]
>>> collatz_path(15)
[15, 46, 23, 70, 35, 106, 53, 160, 80, 40, 20, 10, 5, 16, 8, 4, 2, 1]
If we look at the usage of match above, we see it basically served as a simple switch to match either 0 or 1,
the only two values that the operation n % 2 could result in for a positive integer n. Notice that if we use a
plain if we can write exactly the same code and save one line of code:
def collatz_path(n):
path = [n]
while n != 1:
if n % 2:
n = 3*n + 1
else:
n //= 2
path.append(n)
return path
We saved one line of code and reduced the maximum depth of our indentation: with the match we had code
that was indented four times, whereas the implementation with the if only has three levels of depth. When
you only have a couple of options and you are checking for explicit equality, a short and sweet if statement
is most likely the way to go.
Be smart(er)
Sometimes you will feel like you have to list a series of cases and corresponding values, so that you can map
one to the other. However, it might be the case that you could make your life much simpler by looking for an
alternative algorithm or formula and implementing that instead. I’ll show you an example.
In case you never heard of it, Rule 30 is an “elementary cellular automaton”. You can think of it as a rule
that receives three bits (three zeroes/ones) and produces a new bit, depending on the three bits it received.
Automatons are really, really, interesting, but discussing them is past the point of this article. Let us just look
at a possible implementation of the “Rule 30” automaton:
def rule30(bits):
match bits:
case 0, 0, 0:
return 0
case 0, 0, 1:
return 1
case 0, 1, 0:
return 1
case 0, 1, 1:
return 1
case 1, 0, 0:
This book is a WIP. Check online to get updates for free. 116
return 1
case 1, 0, 1:
return 0
case 1, 1, 0:
return 0
case 1, 1, 1:
return 0
This seems like a sensible use of the match statement, except that we just wrote 16 lines of code… Ok, you
are right, let us put together the rules that return the same values, that should make the code shorter:
def rule30(bits):
match bits:
case 0, 0, 0 | 1, 0, 1 | 1, 1, 0 | 1, 1, 1:
return 0
case 0, 0, 1 | 0, 1, 0 | 0, 1, 1 | 1, 0, 0:
return 1
Yup, much better. But now we have four options on each case, and I have to squint to figure out where each
option starts and ends, and the long strings of zeroes and ones aren’t really that pleasant to the eye… Can
we make it better..?
With just a little bit of research you can find out that the “Rule 30” can be written as a closed formula that
depends on the three input bits, which means we don’t have to match the input bits with all the possible
inputs, we can just compute the output:
def rule30(bits):
p, q, r = bits
return (p + q + r + q*r) % 2
You might argue that this formula obscures the relationship between the several inputs and their outputs.
You are right in principle, but having the explicit “Rule 30” written out as a match doesn’t tell you much about
why each input maps to each output either way, so why not make it short and sweet?
Basic mappings
Getting from dictionaries
There are many cases in which you just want to take a value in and map it to something else. As an example,
take this piece of code that takes an expression and writes it in prefix notation:
import ast
def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
This book is a WIP. Check online to get updates for free. 117
case ast.BinOp(lhs, op, rhs):
match op:
case ast.Add():
sop = "+"
case ast.Sub():
sop = "-"
case ast.Mult():
sop = "*"
case ast.Div():
sop = "/"
case _:
raise NotImplementedError()
return f"{sop} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()
def op_to_str(op):
match op:
case ast.Add():
sop = "+"
case ast.Sub():
sop = "-"
case ast.Mult():
sop = "*"
case ast.Div():
sop = "/"
case _:
raise NotImplementedError()
return sop
def prefix(tree):
match tree:
case ast.Expression(expr):
return prefix(expr)
case ast.Constant(value=v):
return str(v)
This book is a WIP. Check online to get updates for free. 118
case ast.BinOp(lhs, op, rhs):
return f"{op_to_str(op)} {prefix(lhs)} {prefix(rhs)}"
case _:
raise NotImplementedError()
getattr
Another useful mechanism that we have available is the getattr function, which is part of a trio of Python
built-in functions: hasattr, getattr and setattr.
I will be writing about this trio in a future Pydon’t; be sure to subscribe to the Pydon’t newsletter so you
don’t miss it! For now, I’ll just show you briefly what getattr can do for you.
I am writing an APL interpreter called RGSPL, and there is a function named visit_F where I need to map
APL primitives like + and - to the corresponding Python function that implements it. These Python functions,
implementing the behaviour of the primitives, live in the functions.py file. If I were using a match statement,
here is what this visit_F could look like:
import functions
This book is a WIP. Check online to get updates for free. 119
name = func.token.type.lower() # Get the name of the symbol.
match name:
case "plus":
function = functions.plus
case "minus":
function = functions.minus
case "reshape":
function = functions.reshape
case _:
function = None
if function is None:
raise Exception(f"Could not find function {name}.")
return function
This is a similar problem to the one I showed above, where we wanted to get a string for each type of operator
we got, so this could actually be written with the dictionary mapping. I invite you to do it, as a little exercise.
However, here’s the catch: I have still a long way to go in my RGSPL project, and I already have a couple
dozen of those primitives, so my match statement would be around 40 lines long, if I were using that solution,
or 20 lines long if I were using the dictionary solution, with a key, value pair per line.
Thankfully, Python’s getattr can be used to get an attribute from an object, if I have the name of that
attribute. It is no coincidence that the value of the name variable above is supposed to be exactly the same
as the name of the function defined inside functions.py:
import functions
This book is a WIP. Check online to get updates for free. 120
foo = Foo(3, 4)
print(getattr(foo, "a")) # prints 3
bar = Foo(10, ";")
print(getattr(bar, ";")) # prints ';'
This goes to show that it is always nice to know the tools you have at your disposal. Not everything has
very broad use cases, but that also means that the more specialised tools are the ones that make the most
difference when they are brought in.
Speaking of knowing your tools, the last use case in this article for which match is a bad alternative is related
to calling different functions when your data has different types.
This book is a WIP. Check online to get updates for free. 121
@functools.singledispatch
def pretty_print(arg):
print(arg)
@pretty_print.register(complex)
def _(arg):
print(f"{arg.real} + {arg.imag}i")
@pretty_print.register(list)
@pretty_print.register(tuple)
def _(arg):
for i, elem in enumerate(arg):
print(i, elem)
@pretty_print.register(dict)
def _(arg):
for key, value in arg.items():
print(f"{key}: {value}")
And this can then be used exactly like the original function:
>>> pretty_print(3)
3
>>> pretty_print([2, 5])
0 2
1 5
>>> pretty_print(3+4j)
3.0 + 4.0i
The pretty_print example isn’t the best example because you spend as many lines decorating as in defining
the actual subfunctions, but this shows you the pattern that you can now be on the lookout for. You can read
more about singledispatch in the docs.
Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“The new match statement is great, but that does not mean the match statement will be the best
alternative always and, in particular, the match statement is generally being misused if you use it
as a simple switch.”
This Pydon’t showed you that:
• match isn’t necessarily always the best way to implement control flow;
• short and basic match statements could be vanilla if statements;
• sometimes there is a way to compute what you need, instead of having to list many different cases and
their respective values;
This book is a WIP. Check online to get updates for free. 122
• built-in tools like dict.get and getattr can also be used to fetch different values depending on the
matching key; and
• you can use functools.singledispatch when you need to execute different subfunctions when the
input has different types.
References
• PEP 622 – Structural Pattern Matching, https://www.python.org/dev/peps/pep-0622/;
• PEP 634 – Structural Pattern Matching: Specification, https://www.python.org/dev/peps/pep-0634/;
• PEP 635 – Structural Pattern Matching: Motivation and Rationale, https://www.python.org/dev/peps/
pep-0635/;
• PEP 636 – Structural Pattern Matching: Tutorial, https://www.python.org/dev/peps/pep-0636/;
• PEP 443 – Single-dispatch generic functions, https://www.python.org/dev/peps/pep-0443/;
• Python 3 Documentation, The Python Standard Library, getattr, https://docs.python.org/3/library/fu
nctions.html#getattr;
• Python 3 Documentation, The Python Standard Library, functools.singledispatch, https://docs.pyt
hon.org/3/library/functools.html#functools.singledispatch;
• Wikipedia, “Collatz Conjecture”, https://en.wikipedia.org/wiki/Collatz_conjecture;
• WolframAlpha, “Rule 30”, https://www.wolframalpha.com/input/?i=rule+30;
• Wikipedia, “Rule 30”, https://en.wikipedia.org/wiki/Rule_30;
This book is a WIP. Check online to get updates for free. 123
Watch out for recursion
124
Introduction
In this Pydon’t I am going to talk a little bit about when and why recursion might not be the best strategy
to solve a problem. This discussion will entail some particularities of Python, but will also cover broader
topics and concepts that encompass many programming languages. After this brief discussion, I will show
you some examples of recursive Python code and its non-recursive counterparts.
Despite what I said I’ll do, don’t take me wrong: the purpose of this Pydon’t is not to make you dislike
recursion or to say that recursion sucks. I really like recursion and I find it very elegant.
RecursionError
The first thing we will discuss is the infamous recursion depth limit that Python enforces.
If you have no idea what I am talking about, then either - you never wrote a recursive function in your life, or
- you are really, really good and never made a mistake in your recursive function definitions.
The recursion depth limit is something that makes your code raise a RecursionError if you make too many
recursive calls. To see what I am talking about, just do the following in your REPL:
>>> def f():
... return f()
...
>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in f
File "<stdin>", line 2, in f
File "<stdin>", line 2, in f
[Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded
>>>
In many cases, this limit helps, because it helps you find recursive functions for which you did not define the
base case properly.
There are, however, cases in which 1000 recursive calls isn’t enough to finish your computations. A classical
example is that of the factorial function:
>>> def fact(n):
... if n == 0:
... return 1
... return n*fact(n-1)
...
This book is a WIP. Check online to get updates for free. 125
>>> fact(10)
3628800
>>> fact(2000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in fact
File "<stdin>", line 5, in fact
File "<stdin>", line 5, in fact
[Previous line repeated 995 more times]
File "<stdin>", line 2, in fact
RecursionError: maximum recursion depth exceeded in comparison
Our function is properly defined but by default Python does not allow us to make sufficient recursive calls.
If you must, you can always set your own recursion depth:
>>> import sys
>>> sys.setrecursionlimit(3000)
>>> fact(2000)
33162... # (omitted for brevity)
>>> sys.getrecursionlimit()
3000
Just be careful with it. I never tried, but you are likely not to be interested in having Python run out of memory
because of your obscenely large amount of recursive calls.
Hence, if your function is such that it will be constantly trying to recurse more than the recursion depth
allowed, you might want to consider a different solution to your problem.
This book is a WIP. Check online to get updates for free. 126
process.
In practice, Python does not do this intentionally, and I refer you to the two articles on the Neopythonic blog
(by Guido van Rossum) in the references to read more on why Python does not have such a feature.
Converting recursive functions into tail recursive functions is an interesting exercise and I challenge you to
do so, but you won’t get speed gains for it. However, it is very easy to remove the recursion of a tail recursive
function, and I will show you how to do it in the examples below.
Branching overlap
Another thing to take into account when considering a recursive solution to a problem is: is there going to
be much overlap in the recursive calls?
If your recursive function branches in its recursive calls and the recursive calls overlap, then you may be
wasting plenty of time recalculating the same values over and over again. More often than not this can be
fixed easily, but just because a problem probably has a simple solution, it doesn’t mean you can outright
ignore it.
A classical example of recursion that leads to plenty of wasted computations is the Fibonacci sequence
example:
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
A simple modification to this function shows that there are many recursive calls being made:
call_count = 0
def fibonacci(n):
global call_count
call_count += 1
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
print(fibonacci(10))
print(call_count) # 177
If your function is more involved, then the time you waste on recalculations can become unbearable.
This book is a WIP. Check online to get updates for free. 127
A very good example of this distinction popped up when I solved the water bucket riddle: I wanted to write
code that solved (a more generic version of) that riddle where you have a bucket that can hold A litres, another
one that holds B litres, and you have to move water around to get one of the buckets to hold exactly T litres.
The solution can be easily expressed in recursive terms, but my implementation actually used a while loop
and a BFS algorithm.
If you don’t know what this means, the best thing to do is to google it. For example, visit the Wikipedia pages
on Depth-first Search and Breadth-first Search. In a short and imprecise sentence, Depth-First Search (DFS)
means that when you are traversing some structure, you prioritise exploring in depth, and only then you look
around, whereas in Breadth-First Search (BFS) you first explore the level you are at, and only then go a level
deeper.
Examples in code
I will now show some recursive code that can incur in some of the problems mentioned above, and will also
share non-recursive versions of those same pieces of code.
Factorials
The toy example of the factorial is great because it lends itself to countless different implementations, and
the ideas that these implementations exhibit can then be adapted to more complex recursions.
The main characteristic here is that the recursion of the factorial is a “linear” recursion, where each call only
performs a single recursive call, and each recursive call is for a simpler problem.
The vanilla recursion follows:
def factorial(n):
if n <= 1:
return 1
return n * factorial(n-1)
Like we have seen above, we could use an accumulator to write a tail recursive version of the factorial, even
thought Python won’t optimise that in any way:
def factorial(n, partial=1):
if n <= 1:
return partial
return factorial(n-1, n*partial)
Now that we have this function written in a tail recursive way, we can actually remove the recursion altogether
following a simple recipe:
def factorial(n):
partial = 1
while n > 1:
n, partial = n-1, n*partial
return partial
This book is a WIP. Check online to get updates for free. 128
This is a generic transformation you can do for any tail recursive function and I’ll present more examples
below.
Still on the factorial, because this is a linear recursion (and a fairly simple one, yes), there are many ways
in which this function can be rewritten. I present a couple, pretending for a second that math.factorial
doesn’t exist:
import math
def factorial(n):
return math.prod(i for i in range(1, n+1))
def factorial(n):
fact = 1
for i in range(1, n+1):
fact *= i
return fact
If you are solving a problem and come up with different solutions, don’t be afraid to try them out.
List sum
You can implement your own sum recursively:
def sum(l):
if not l:
return 0
return l[0] + sum(l[1:])
If you carry a partial sum down the recursive calls, you can make this tail recursive:
def sum(l, partial=0):
if not l:
return partial
return sum(l[1:], l[0] + partial)
From the tail recursive function to the while solution is simple:
def sum(l):
partial = 0
while l:
This book is a WIP. Check online to get updates for free. 129
l, partial = l[1:], l[0] + partial
return partial
Notice what happened: - the default value of the auxiliary variable becomes the first statement of the function;
- you write a while loop whose condition is the complement of the base case condition; - you update your
variables just like you did in the tail recursive call, except now you assign them explicitly; and - after the
while you return the auxiliary variable.
Of course there are simpler implementations for the sum, the point here is that this transformation is generic
and always works.
Sorting a list
Here is another example where we sort a list with selection sort. First, “regular” recursion:
def selection_sort(l):
if not l:
return []
m = min(l)
idx = l.index(m)
return [m] + selection_sort(l[:idx]+l[idx+1:])
Now a tail recursive version:
def selection_sort(l, partial=None): # partial=[] is bad!
if partial is None:
partial = []
if not l:
return partial
m = min(l)
idx = l.index(m)
selection_sort(l[:idx]+l[idx+1:], partial + [m])
In the above we just have to be careful with something: the default value of partial is supposed to be the
empty list, but you should avoid mutable types in your arguments’ default values, so we go with None and
then the very first thing we do is set partial = [] in case it was None.
Finally, applying the recipe, we can remove the recursion:
def selection_sort(l):
partial = []
while l:
m = min(l)
idx = l.index(m)
l, partial = l[:idx]+l[idx+1:], partial + [m]
return partial
Traversing (a directory)
The Depth-first versus Breadth-first distinction is more likely to pop up when you have to traverse something.
This book is a WIP. Check online to get updates for free. 130
In this example, we will traverse a full directory, printing file names and file sizes. A simple, purely recursive
solution follows:
import pathlib
def print_file_sizes(path):
"""Print file sizes in a directory."""
path_obj = pathlib.Path(path)
if path_obj.is_file():
print(path, path_obj.stat().st_size)
else:
for path in path_obj.glob("*"):
print_file_sizes(path)
If you apply that function to a directory tree like this one,
- file1.txt
- subdir1
| - file2.txt
| - subdir2
| - file3.txt
| - subdir3
| - deep_file.txt
then the first file you will see printed is deep_file.txt, because this recursive solution traverses your file-
system depth first. If you wanted to traverse the directory breadth-first, so that you first found file1.txt,
then file2.txt, then file3.txt, and finally deep_file.txt, you could rewrite your function to look like the
following:
import pathlib
def print_file_sizes(dir):
"""Print file sizes in a directory, recurse into subdirs."""
paths_to_process = [dir]
while paths_to_process:
path, *paths_to_process = paths_to_process
path_obj = pathlib.Path(path)
if path_obj.is_file():
print(path, path_obj.stat().st_size)
else:
paths_to_process += path_obj.glob("*")
This example that I took from my “Truthy, Falsy, and bool” Pydon’t uses the paths_to_process list to keep
track of the, well, paths that still have to be processed, which mimics recursion without actually having to
recurse.
This book is a WIP. Check online to get updates for free. 131
Keeping branching in check
Overlaps
When your recursive function branches out a lot, and those branches overlap, you can save some computa-
tional effort by saving the values you computed so far. This can be as simple as having a dictionary inside
which you check for known values and where you insert the base cases.
This technique is often called memoisation and will be covered in depth in a later Pydon’t, so
stay tuned!
call_count = 0
fibonacci_values = {0: 0, 1: 1}
def fibonacci(n):
global call_count
call_count += 1
try:
return fibonacci_values[n]
except KeyError:
fib = fibonacci(n-1) + fibonacci(n-2)
fibonacci_values[n] = fib
return fib
print(fibonacci(10))
print(call_count) # 19
Notice that this reduced the recursive calls from 177 to 19. We can even count the number of times we have
to perform calculations:
computation_count = 0
fibonacci_values = {0: 0, 1: 1}
def fibonacci(n):
try:
return fibonacci_values[n]
except KeyError:
global computation_count
computation_count += 1
fib = fibonacci(n-1) + fibonacci(n-2)
fibonacci_values[n] = fib
return fib
print(fibonacci(10))
print(computation_count) # 9
This shows that saving partial results can really pay off!
This book is a WIP. Check online to get updates for free. 132
Writing recursive branching as loops
To show you how you can rewrite a recursive, branching function as a function that uses while loops we will
take a look at another sorting algorithm, called merge sort. The way merge sort works is simple: to sort a
list, you start by sorting the first and last halves separately, and then you merge the two sorted halves.
Written recursively, this might look something like this:
def merge(l1, l2):
result = []
while l1 and l2:
if l1[0] < l2[0]:
h, *l1 = l1
else:
h, *l2 = l2
result.append(h)
def merge_sort(l):
"""Sort a list recursively with the merge sort algorithm."""
# Base case.
if len(l) <= 1:
return l
# Sort first and last halves.
m = len(l)//2
l1, l2 = merge_sort(l[:m]), merge_sort(l[m:])
# Now put them together.
return merge(l1, l2)
If you don’t want to have all this recursive branching, you can use a generic list to keep track of all the sublists
that are still to be sorted:
def merge(l1, l2):
"""Merge two lists in order."""
result = []
while l1 and l2:
if l1[0] < l2[0]:
h, *l1 = l1
else:
h, *l2 = l2
result.append(h)
This book is a WIP. Check online to get updates for free. 133
result.extend(l2) # the other contains the larger elements.
return result
def merge_sort(l):
"""Sort a list with the merge sort algorithm."""
return already_sorted[0]
If you don’t really know what the h, *l1 = l1, h, *l2 = l2, lst, *to_sort = to_sort and l1, l2,
*already_sorted = already_sorted lines are doing, you might want to have a look at this Pydon’t about
unpacking with starred assignments.
In this particular example, my translation of the merge sort to a non-recursive solution ended up being no-
ticeably larger than the recursive one. This just goes to show that you need to judge all situations by yourself:
would this be worth it? Is there an imperative implementation that is better than this direct translation? The
answers to these questions will always depend on the programmer and the context they are in.
This also shows that the way you think about the problem has an effect on the way the code looks: even
though this last implementation is imperative, it is a direct translation of a recursive implementation and so
it may not look as good as it could!
Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
This book is a WIP. Check online to get updates for free. 134
“Pydon’t recurse mindlessly.”
This Pydon’t showed you that:
• Python has a hard limit on the number of recursive calls you can make and raises a RecursionError if
you cross that limit;
• Python does not optimise tail recursive calls, and probably never will;
• tail recursive functions can easily be transformed into imperative functions;
• recursive functions that branch can waste a lot of computation if no care is taken;
• traversing something with pure recursion tends to create depth first traversals, which might not be the
optimal way to solve your problem; and
• direct translation of recursive functions to imperative ones and vice-versa will probably produce sub-
optimal code, so you need to align your mindset with what you want to accomplish.
References
• Stack Overflow, “What is the maximum recursion depth in Python, and how to increase it?”, https:
//stackoverflow.com/questions/3323001/what-is-the-maximum-recursion-depth-in-python-and-
how-to-increase-it.
• Stack Overflow, “Does Python optimize tail recursion?”, https://stackoverflow.com/questions/135919
70/does-python-optimize-tail-recursion.
• Neopythonic, Tail Recursion Elimination, http://neopythonic.blogspot.com/2009/04/tail-recursion-
elimination.html.
• Neopythonic, Final Words on Tail Calls, http://neopythonic.blogspot.com/2009/04/final-words-on-
tail-calls.html.
• Documentation, The Python Standard Library, Functional Programming Modules, operator, https://do
cs.python.org/3/library/operator.html.
Online references last consulted on the 16th of February of 2021.
This book is a WIP. Check online to get updates for free. 135
Sequence indexing
Introduction
Sequences in Python, like strings, lists, and tuples, are objects that support indexing: a fairly simple operation
that we can use to access specific elements. This short article will cover the basics of how sequence indexing
works and then give you some tips regarding anti-patterns to avoid when using indices in your Python code.
In this article you will:
136
• learn the basic syntax for indexing sequences;
• learn how negative indices work;
• see some tools that are often used to work with sequences and indices;
• learn a couple of tricks and things to avoid when indexing;
Sequence indexing
First and foremost, I am talking about sequence indexing here to distinguish the type of indexing you do to
access the values of a dictionary, where you use keys to index into the dictionary and retrieve its values. In
this article we will be talking about using integers to index linear sequences, that is, sequences that we can
traverse from one end to the other, in an ordered fashion.
A very simple example of such a sequence is a string:
>>> s = "Indexing is easy!"
>>> s
'Indexing is easy!'
To index a specific character of this string I just use square brackets and the integer that corresponds to the
character I want. Python is 0-indexed, which means it starts counting indices at 0. Therefore, the very first
element of a sequence can be obtained with [0]. In our example, this should give a capital "I":
>>> s = "Indexing is easy!"
>>> s[0]
'I'
Then, each following character is obtained by increasing the index by 1:
>>> s = "Indexing is easy!"
>>> s[1]
'n'
>>> s[2]
'd'
>>> s[3]
'e'
Here is a figure that shows how to look at a sequence and figure out which index corresponds to each
element:
This book is a WIP. Check online to get updates for free. 137
Imagine vertical bars that separate consecutive elements, and then number each of those vertical bars,
starting with the leftmost bar. Each element gets the index associated with the bar immediately to its left:
Negative indices
If the last legal index is the length of the sequence minus 1, then there is an obvious way to access the last
item of a sequence:
>>> s = "Indexing is easy!"
>>> s[len(s)-1]
'!'
>>> l = [12, 45, 11, 89, 0, 99]
>>> l[len(l)-1]
99
However, Python provides this really interesting feature where you can use negative indices to count from
the end of the sequence. In order to figure out which negative index corresponds to which element, think
This book is a WIP. Check online to get updates for free. 138
about writing the sequence to the left of itself:
Then you just have to continue the numbering from the right to the left, therefore making use of negative
numbers:
From the figure above you can see that the index -1 refers to the last element of the sequence, the index -2
refers to the second to last, etc:
>>> s = "Indexing is easy!"
>>> s[-1]
'!'
>>> s[-2]
'y'
We can also take a look at all the negative indices that work for our specific sequence:
Another way to look at negative indices is to pretend there is a len(s) to their left:
This book is a WIP. Check online to get updates for free. 139
Negative index Corresponding positive index
-1 len(s) - 1
-2 len(s) - 2
-3 len(s) - 3
… …
-len(s) len(s) - len(s) (same as 0)
Indexing idioms
Having seen the basic syntax for indexing, there are a couple of indices that would be helpful if you were
able to read them immediately for what they are, without having to think about them:
This book is a WIP. Check online to get updates for free. 140
You should also be careful about things that you think are like lists, but really are not. These include
enumerate, zip, map, and other objects. None of these are indexable, none of these have a len value, etc.
Pay attention to that!
>>> l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> e = enumerate(l)
>>> e[3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'enumerate' object is not subscriptable
>>> z = zip(l)
>>> z[3]
## ...
TypeError: 'zip' object is not subscriptable
>>> m = map(str, l)
>>> m[3]
## ...
TypeError: 'map' object is not subscriptable
This book is a WIP. Check online to get updates for free. 141
This is a naïve solution to the problem of “find unique characters”, you probably want to use a Python set
for a more efficient implementation :)
The problem here is that the for loop is being done in a roundabout way: we have access to a sequence
(the string) that we could iterate over, but instead we find its length, so that we can use range to compute
its legal indices, which we then iterate over, only to then access the elements of the sequence through their
indices.
This way of writing for loops is similar to the way one would write for loops in other programming languages,
if you were to iterate over the elements of an array.
However, we are using Python, not any other language. One of the things I enjoy the most about Python’s
for loops is that you can access directly the consecutive elements of a sequence. Hence, we can actually
rewrite our for loop slightly, but in a way that makes it much more elegant:
>>> s = "Indexing is easy!"
>>> uniques = []
>>> for letter in s:
... if letter not in uniques:
... uniques.append(letter)
...
>>> uniques
['I', 'n', 'd', 'e', 'x', 'i', 'g', ' ', 's', 'a', 'y', '!']
What I really like about these types of loops is that if your variables are named correctly, the statements
express your intent very clearly. The line for letter in s: is read as
“For each letter in (the string) s…”
This type of for loop iterates directly over the values you care about, which is often what you want. If you
care about the indices, then be my guest and use range(len(s))!
Another anti-pattern to be on the lookout for happens when you need to work with the indices and the values.
In that case, you probably want to use the enumerate function. I tell you all about that function in a Pydon’t
of its own, so go check that if you haven’t.
This book is a WIP. Check online to get updates for free. 142
' '
>>> s[len(s)//2] # Pro-tip: the operation // is ideal here
' '
Where am I going with this?
Take a look at the expression you just used:
s[math.floor(len(s)/2)]
Maybe it is me getting old, but I struggle a bit to read that because of the [] enclosing the expression which
then has a couple of () that I also have to parse, to figure out what goes where.
If you have large expressions to compute indices (and here, large will be subjective), inserting those expres-
sions directly inside [] may lead to long lines of code that are then complicated to read and understand. If
you have lines that are hard to understand, then you probably need to comment them, creating even more
lines of code.
Another alternative is to create a well-named variable to hold the result of the computation of the new index:
>>> s = "Indexing is easy!"
>>> mid_char_idx = math.floor(len(s)/2)
>>> s[mid_char_idx]
' '
For this silly example, notice that the new variable name is almost as long as the expression itself! However,
s[mid_char_idx] is very, very, easy to read and does not need any further comments.
So, if you have large expressions to compute indices, think twice before using them to index directly into the
sequence at hands and consider using an intermediate variable with a descriptive name.
This book is a WIP. Check online to get updates for free. 143
Something you might consider and that adds a bit of clarity to your code is unpacking the names before you
reach the if statement:
def greet(names, formal):
first, last = names
if formal:
return "Hello Miss " + last
else:
return "Hey there " + first
Why would this be preferable, if I just added a line of code? It makes the intent of the code much more
obvious. Just from looking at the function as is, you can see from the first line first, last = names that
names is supposed to be a pair with the first and last names of a person and then the if: ... else: ...
is very, very easy to follow because we see immediately that we want to use the last name if we need a
formal greeting, and otherwise (else) we use the first name.
Furthermore, the action of unpacking (like so:)
first, last = names
forces your greet function to expect pairs as the names variable, because a list with less or more elements
will raise an error:
>>> first, last = ["Mary", "Anne", "Doe"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)
We are assuming we really are working with pairs, so if the greet function gets something that is not a pair,
this error is useful in spotting a problem in our code. Maybe someone didn’t understand how to use the
function and called it with the first name of the person?
>>> greet("Mary", True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in greet
ValueError: too many values to unpack (expected 2)
This would help you find a location where the greet function was not being properly used.
I have written at length about unpacking in Python (another favourite feature of mine!) so feel free to read
my articles on unpacking with starred assignments and on deep-unpacking.
Conclusion
Here’s the main takeaway of this article, for you, on a silver platter:
“Indexing is simple and powerful, but sometimes when indexing looks like the answer, there is
another Python feature waiting to be used.”
This Pydon’t showed you that:
This book is a WIP. Check online to get updates for free. 144
• Indexing in Python is 0-based;
• Python allows negative indices in sequences;
• Using indices in a for loop to access the elements of a sequence is an anti-pattern in Python;
• Using large expressions when indexing bloats your code and you are better off with a descriptive
variable, even if that variable has a long name;
• If you know the exact structure of the sequence you are dealing with, unpacking might be preferable
to indexing.
This book is a WIP. Check online to get updates for free. 145
Idiomatic sequence slicing
Introduction
Last time we went over sequence indexing in Python to cover the basics for our next topic of discussion:
sequence slicing. Slicing is a “more advanced” way of accessing portions of sequences (like lists and tuples).
I say it is more advanced just because indexing is the simplest form of accessing sequence items; as you
will see, indexing isn’t that complicated either.
As it turns out, much can be said about sequence slicing, so I will split all of the contents into two Pydon’ts,
this and the next one.
In this Pydon’t you will:
146
• learn the slicing syntax;
• learn how slicing works with 1 and 2 parameters;
• relate slices to the range built-in;
• master slicing with negative indices;
• learn to write Pythonic and idiomatic slices; and
• a couple of common use cases where slicing is not the way to go.
In the next Pydon’t we will continue on this train of thought and cover the more advanced material related
to slicing. In particular, you will
• learn about the stride parameter in slicing;
• learn about slice assignment;
• see how slicing can be used to copy sequences;
• learn some more idiomatic slicing patterns;
• uncover the two layers of syntactic sugar surrounding list slicing; and
• learn how to implement slicing for your custom objects.
Throughout both Pydon’ts we will try to keep an eye out for how slices are actually used in real-world Python
code, namely in the Python Standard Library.
If you don’t want to miss the next Pydon’t on the more advanced slicing topics, you can either subscribe to
the Pydon’ts newsletter or grab your copy of the Pydon’ts book right now.
Slicing syntax
Slicing in Python is the act of accessing a sequence of elements that are extracted from successive positions
of a larger sequence. Just think of an actual knife cutting through the sequence you are working with (which
could be a string, list, tuple, etc) and extracting a smaller piece of your sequence.
For example, if we were working with the string "Slicing is easy!", which I present below.
Together with the characters of the string, we have the little numbers indicating the index of each character.
Each little number gives the index for the box right in front of it. This is the representation I go to in my
head whenever I have to reason about indices in Python, especially when I am working with negative indices.
(Just take a quick look at this Pydon’t if you need to jog your memory on how indexing is done in Python.)
Now, we could be interested in extracting the portion "icing" from the string:
This book is a WIP. Check online to get updates for free. 147
How would we do that in Python? If you didn’t know how slicing worked, you could come up with a solution
involving a for loop and a range:
>>> s = "Slicing is easy!"
>>> subs = ""
>>> for i in range(2, 7):
... subs += s[i]
...
>>> subs
'icing'
This is all good, but there is a much shorter syntax for this type of operation, the slicing syntax.
When you want to slice a sequence, you need to use brackets [] and a colon : to separate the start and end
points. The key here is in figuring out what the start and end points are, but that is just a matter of looking
at the figure above or at the solution with the range(2, 7):
>>> s = "Slicing is easy!"
>>> s[2:7]
'icing'
This is the very first important point to make about slicing: the start and end points give you the bars that
enclose what you will extract, which, in other words, means that the start point (2, in the previous example)
is the index of the first element that is included in the slice, whereas the end point is the index of the first
element that is not included in the slice:
>>> s = "Slicing is easy!"
>>> s[2:7]
'icing'
>>> s[7]
' '
Now is a good time to fire up your Python interpreter, define s as the string "Slicing is easy!", and work
out a couple of slices for yourself.
What to slice?
Just in case it wasn’t clear earlier, here are just some of the things that you can slice in Python:
This book is a WIP. Check online to get updates for free. 148
>>> "Hello"[1:3] # strings
'el'
>>> [True, False, 1, "hey"][1:3] # lists
[False, 1]
>>> (True, False, 1, "hey")[1:3] # tuples
(False, 1)
>>> range(10)[1:3] # ranges
range(1, 3)
>>> # etc...
However, we will be using string examples for most of the Pydon’t, just for the sake of consistency.
If we go back to our naïve range solution, most of us would write the following:
>>> s = "Slicing is easy!"
>>> subs = ""
>>> for i in range(7):
... subs += s[i]
...
>>> subs
'Slicing'
Notice that, unlike when we used range(2, 7) for "icing", now our range only has one argument, the end
point. That is because range interprets the missing starting index as 0.
When we are slicing, we can do a similar thing! If we want to extract a portion from the beginning of a
sequence, the Pythonic way of writing that slice is without specifying the explicit 0 as a start point. Therefore,
both alternatives below work, but the second one is the preferred.
>>> s = "Slicing is easy!"
>>> s[0:7] # Works ...
'Slicing'
>>> s[:7] # ... but this is preferred!
'Slicing'
This book is a WIP. Check online to get updates for free. 149
In terms of the figures I have been sharing, think of it like this: it’s like you never tell Python where the slicing
starts, so the bar that is hovering the string ends up covering the whole beginning of the string, stopping at
the position you indicate.
Therefore, if we don’t indicate the end point for the slice, we extract all elements from the point specified,
onwards. Naturally, we can specify the end point of the slice to be the length of the sequence, but that adds
too much visual noise:
>>> s = "Slicing is easy!"
>>> s[7:len(s)] # Works...
' is easy!'
>>> s[7:] # ... but this is preferred!
' is easy!'
This book is a WIP. Check online to get updates for free. 150
To illustrate this, here is the representation of the negative indices of the string we have been using so far:
Now, regardless of the fact that the numbers are negative, if you had to tell me where to draw two vertical
bars in order to enclose the substring "icing", what positions would you point to? You would probably tell
me “Draw the bars on positions -14 and -9”, and that would be absolutely correct!
In fact, using -14 and -9 would work in my naïve range solution but also – and most importantly – with the
slice syntax:
>>> s = "Slicing is easy!"
>>> subs = ""
>>> for i in range(-14, -9):
... subs += s[i]
...
>>> subs
'icing'
>>> s[-14:-9] # Also works and is preferred!
'icing'
This book is a WIP. Check online to get updates for free. 151
Idiomatic slicing patterns
Now that you have taken a look at some basic slicing with positive and negative indices, and now that you
know you can omit the first or the last parameters of your slices, you should really learn about four different
slice patterns that are really idiomatic. Don’t worry, I’ll show you which four patterns I am talking about.
Suppose you have a variable n that is a positive integer (it may help to think of it as a small integer, like 1 or
2), and suppose s is some sequence that supports slicing. Here are the four idiomatic slicing patterns I am
talking about:
• s[n:]
• s[-n:]
• s[:n]
• s[:-n]
Why are these “idiomatic” slicing patterns? These are idiomatic because, with a little practice, you stop
looking at them as “slice s starting at position blah and ending at position blah blah”, and you will start
looking at them for their semantic meaning.
Open your Python interpreter, set s = "Slicing is easy!" and n = 2, and see what the four slices above
return. Experiment with other values of n. Can you give an interpretation for what each slice means?
Go ahead…
Here is what the slicing patterns mean.
s[n:]
If n is not negative (so 0 or more), then s[n:] means “skip the first n elements of s”:
>>> s = "Slicing is easy!"
>>> s[2:]
'icing is easy!'
>>> s[3:]
'cing is easy!'
>>> s[4:]
'ing is easy!'
s[-n:]
If n is positive (so 1 or more), then s[-n:] means “the last n elements of s”:
>>> s = "Slicing is easy!"
>>> s[-2:]
'y!'
>>> s[-3:]
'sy!'
>>> s[-4:]
'asy!'
This book is a WIP. Check online to get updates for free. 152
Be careful with n = 0, because -0 == 0 and that means we are actually using the previous slicing pattern,
which means “skip the first n characters”, which means we skip nothing and return the whole sequence:
>>> s = "Slicing is easy!"
>>> s[-0:]
'Slicing is easy!''
s[:n]
If n is not negative (so 0 or more), then s[:n] can be read as “the first n elements of s”:
>>> s = "Slicing is easy!"
>>> s[:2]
'Sl'
>>> s[:3]
'Sli'
>>> s[:4]
'Slic'
s[:-n]
Finally, if n is positive (so 1 or more), then s[:-n] means “drop the last n elements of s”:
>>> s = "Slicing is easy!"
>>> s[:-2]
'Slicing is eas'
>>> s[:-3]
'Slicing is ea'
>>> s[:-4]
'Slicing is e'
Like with the s[-n:] pattern, we need to be careful with n = 0, as the idiom s[:-n] doesn’t really apply,
and we should be looking at the previous idiom.
Empty slices
Something worthy of note that may confuse some but not others, is the fact that if you get your start and end
points mixed up, you will end up with empty slices, because your start point is to the right of the end point…
And because of negative indices, it is not enough to check if the start point is less than the end point.
Take a look at the figure below:
This book is a WIP. Check online to get updates for free. 153
Now try to work out why all of these slices are empty:
>>> s = "Slicing is easy!"
>>> s[10:5]
''
>>> s[-6:-10]
''
>>> s[-9:3]
''
>>> s[10:-10]
''
All it takes is looking at the figure above, and realising that the end point is relative to an index that is to the
left of the start point.
Examples in code
Ensuring at most n elements
Imagine someone is writing a spellchecker, and they have a function that takes a misspelled word and returns
the top 5 closest suggestions for what the user meant to type.
This book is a WIP. Check online to get updates for free. 154
Here is what that function could look like:
def compute_top_suggestions(misspelled, k, corpus):
similar = find_similar(misspelled, corpus)
ordered = rank_suggestions_by_similarity(misspelled, similar)
top_k = []
for i in range(min(k, len(ordered))):
top_k.append(ordered[i])
return top_k
The final loop there is to make sure you return at most k results. However, the person who wrote this piece
of code did not read this Pydon’t! Because if they had, they would know that you can use slicing to extract at
most k elements from ordered:
def compute_top_suggestions(misspelled, k, corpus):
similar = find_similar(misspelled, corpus)
ordered = rank_suggestions_by_similarity(misspelled, similar)
return ordered[:k]
# ^ Idiom! Read as “return at most `k` from beginning”
A very similar usage pattern arises when you want to return at most k from the end, but you already knew
that because you read about the four slicing idioms I shared earlier.
This usage pattern of slicing can show up in many ways, as this is just us employing slicing because of the
semantic meaning this particular idiom has. Above, we have seen four different idioms, so just keep those
in mind with working with sequences!
Start of a string
Slicing is great, I hope I already convinced you of that, but slicing is not the answer to all of your problems!
A common use case for slices is to check if a given sequence starts with a predefined set of values. For
example, we might want to know if a string starts with the four characters ">>> ", which are the characters
that mark the REPL Python prompt. The doctest Python module, for example, does a similar check, so we
will be able to compare our solution to doctest’s.
You just learned about slicing and you know that s[:4] can be read idiomatically as “the first four characters
of s”, so maybe you would write something like
def check_prompt(line):
if line[:4] == ">>> ":
return True
return False
or, much more elegantly,
def check_prompt(line):
return line[:4] == ">>> "
However, it is important to note that this is not the best solution possible, because Python strings have an
appropriate method for this type of check: the startswith function.
This book is a WIP. Check online to get updates for free. 155
Therefore, the best solution would be
def check_prompt(line):
return line.startswith(">>> ")
This is better because this is a tested and trusted function that does exactly what you need, so the code
expresses very clearly what you want. What is more, if you later change the prompt, you don’t need to
remember to also change the index used in the slice.
If we take a look at the actual source code for doctest, what they write is
## Inside _check_prefix from Lib/doctest.py for Python 3.9.2
## ...
if line and not line.startswith(prefix):
# ...
As we can see here, they are using the startswith method to see if line starts with the prefix given as
argument.
Similar to startswith, strings also define an endswith method.
This book is a WIP. Check online to get updates for free. 156
However, I already have Python 3.9 installed on my machine, so I should be using the string methods that
Python provides me with:
def strip_prefix(line, prefix):
return line.removeprefix(prefix)
Of course, at this point, defining my own function is redundant and I would just go with
>>> prompt = ">>> "
>>> ">>> 3 + 3".removeprefix(prompt)
'3 + 3'
>>> "6".removeprefix(prompt)
'6'
In case you are interested, Python 3.9 also added a removesuffix method that does the analogous, but at
the end of strings.
This just goes to show that it is nice to try and stay more or less on top of the features that get added to your
favourite/most used programming languages. Also (!), this also shows that one has to be careful when looking
for code snippets online, e.g. on StackOverflow. StackOverflow has amazing answers… that get outdated, so
always pay attention to the most voted answers, but also the most recent ones, those could contain the more
modern approaches.
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“The relationship between slicing and indexing means there are four really nice idiomatic usages
of slices that are well-worth knowing.”
This Pydon’t showed you that:
• slicing sequences lets you access series of consecutive elements;
• you can slice strings, lists, tuples, ranges, and more;
• if the start parameter is omitted, the slice starts from the beginning of the sequence;
• if the end parameter is omitted, the slice ends at the end of the sequence;
• slicing is the same as selecting elements with a for loop and a range with the same parameters;
• much like with plain indexing, negative integers can be used and those count from the end of the
sequence;
• s[n:], s[-n:], s[:n], and s[:-n] are four idiomatic slicing patterns that have a clear semantic mean-
ing:
– s[n:] is “skip the first n elements of s”;
– s[-n:] is “the last n elements of s”;
– s[:n] is “the first n elements of s”;
– s[:-n] is “skip the last n elements of s”;
• slices with parameters that are too large produce empty sequences;
• if the parameters are in the wrong order, empty sequences are produced; and
• some operations that seem to ask for slicing might have better alternatives, for example using
startswith, endswith, removeprefix, and removesuffix with strings.
This book is a WIP. Check online to get updates for free. 157
References
• Python 3 Documentation, The Python Language Reference, Expressions – Slicings, https://docs.pytho
n.org/3/reference/expressions.html#slicings [last acessed 20-04-2021];
• Python 3 Documentation, The Python Language Reference, Data Model – The Standard Type Hierarchy,
https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy [last accessed
20-04-2021];
• Python 3 Documentation, The Python Language Reference, Data Model – Emulating Container Types,
https://docs.python.org/3/reference/datamodel.html#emulating-container-types [last accessed
20-04-2021];
• Python 3 Documentation, The Python Language Reference, Built-in Functions, slice, https://docs.pyt
hon.org/3/library/functions.html#slice [last accessed 20-04-2021];
• “Effective Python – 90 Specific Ways to Write Better Python”; Slatkin, Brett; ISBN 9780134853987.
This book is a WIP. Check online to get updates for free. 158
Mastering sequence slicing
Introduction
In the previous Pydon’t we looked at using sequence slicing to manipulate sequences, such as strings or lists.
In this Pydon’t we will continue on the subject of slicing sequences, but we will look at the more advanced
topics. In particular, in this Pydon’t you will
• learn about the step parameter in slicing;
• see how slicing can be used to copy sequences;
• learn about slice assignment;
• learn some more idiomatic slicing patterns;
159
As it turns out, there is A LOT to say about sequence slicing, so I will have to split this Pydon’t yet again, and
next time we will finish off the subject of slicing sequences with:
• uncovering the two layers of syntactic sugar surrounding sequence slicing; and
• seeing how to implement slicing for your custom objects.
Slicing step
The next stop in your journey to mastering slicing in Python is knowing about the lesser-used third parameter
in the slice syntax: the step.
This book is a WIP. Check online to get updates for free. 160
>>> s = 'Slicing is easy!'
>>> s[2:14:2]
'iigi a'
What happens first is that s[2:14] tells Python that we only want to work with a part of our original string:
>>> s = 'Slicing is easy!'
>>> s[2:14]
'icing is eas'
Then the step parameter kicks in and tells Python to only pick a few elements, like the figure below shows:
This book is a WIP. Check online to get updates for free. 161
This is why we get
>>> s = "Slicing is easy!"
>>> s[2:14:3]
'inie'
Negative step
We have seen how a positive step parameter behaves, now we will see how a negative one does. This is
where things really get confusing, and at this point it really is easier to understand how the slicing works if
you are comfortable with how range works with three arguments.
When you specify a slice with s[start:stop:step], you will get back the elements of s that are in the indices
pointed to by range(start, stop, step). If step is negative, then the range function will be counting from
start to stop backwards. This means that start needs to be larger than stop, otherwise there is nothing
to count.
This book is a WIP. Check online to get updates for free. 162
For example, range(3, 10) gives the integers 3 to 9. If you want the integers 9 to 3 you can use the step -1,
but you also need to swap the start and stop arguments. Not only that, but you also need to tweak them a
bit. The start argument is the first number that is included in the result and the stop argument is the first
number that isn’t, so if you want the integers from 9 to 3, counting down, you need the start argument to
be 9 and the stop argument to be 2:
>>> list(range(3, 10))
[3, 4, 5, 6, 7, 8]
>>> list(range(3, 10, -1)) # You can't start at 3 and count *down* to 10.
[]
>>> list(range(10, 3, -1)) # Start at 10 and stop right before 3.
[10, 9, 8, 7, 6, 5, 4]
>>> list(range(9, 2, -1)) # Start at 9 and stop right before 2
[9, 8, 7, 6, 5, 4, 3]
If you are a bit confused, that is normal. Take your time to play around with range and get a feel for how this
works.
Using the range results from above and the figure below, you should be able to figure out why these slices
return these values:
>>> s[3:10]
'cing is'
>>> s[3:10:-1]
''
>>> s[10:3:-1]
' si gni'
>>> s[9:2:-1]
'si gnic
Use the range results above and this figure to help you out:
If you want to use a negative range that is different from -1, the same principle applies: the start parameter
of your string slice should be larger than the stop parameter (so that you can count down from start to
stop) and then the absolute value will tell you how many elements you skip at a time. Take your time to work
these results out:
>>> s = 'Slicing is easy!'
>>> s[15:2:-1]
This book is a WIP. Check online to get updates for free. 163
'!ysae si gnic'
>>> s[15:2:-2]
'!ses nc'
>>> s[15:2:-3]
'!asgc'
>>> s[15:2:-4]
'!e c'
An important remark is due: while range accepts negative integers as the start and end arguments and
interprets those as the actual negative numbers, remember that slicing also accepts negative numbers but
those are interpreted in the context of the sequence you are slicing.
What is the implication of this?
It means that if step is negative in the slice s[start:stop:step], then start needs to refer to an element
that is to the right of the element referred to by stop.
I will give you an explicit example of the type of confusion that the above remark is trying to warn you about:
>>> s = 'Slicing is easy!'
>>> list(range(2, -2, -1))
[2, 1, 0, -1]
>>> s[2:-2:-1]
''
>>> s[2]
'i'
>>> s[-2]
'y'
Notice how range(2, -2, -1) has four integers in it but s[2:-2:-1] is an empty slice. Why is that? Because
s[2] is the first “i” in s, while s[-2] is the “y” close to the end of the string. Using a step of -1 would have us
go from the “i” to the “y”, but going right to left… If you start at the “i” and go left, you reach the beginning
of the string, not the “y”.
Perhaps another way to help you look at this is if you recall that s[-2] is the same as s[len(s)-2], which
in this specific case is s[14]. If we take the piece of code above and replace all the -2 with 14, it should
become clearer why the slice is empty:
>>> s = 'Slicing is easy!'
>>> list(range(2, 14, -1))
[]
>>> s[2:14:-1]
''
>>> s[2]
'i'
>>> s[14]
'y'
This book is a WIP. Check online to get updates for free. 164
Reversing and then skipping
Another possible way to get you more comfortable with these negative steps is if you notice the relationship
between slices with a step of the form -n and two consecutive slices with steps -1 and n:
>>> s = 'Slicing is easy!'
>>> s[14:3:-2]
'ya igi'
>>> s[14:3:-1]
'ysae si gni'
>>> s[14:3:-1][::2]
'ya igi'
We can take this even further, and realise that the start and stop parameters are used to shorten the sequence,
and that the step parameter is only then used to skip elements:
>>> s = 'Slicing is easy!'
>>> s[14:3:-2]
'ya igi'
>>> s[4:15] # Swap `start` and `stop` and add 1...
'ing is easy'
>>> s[4:15][::-1] # ...then reverse...
'ysae si gni'
>>> s[4:15][::-1][::2] # ...then pick every other element.
'ya igi'
Zero
For the sake of completeness, let’s just briefly mention what happens if you use 0 as the step parameter,
given that we have taken a look at strictly positive steps and strictly negative steps:
>>> s = "Slicing is easy!"
>>> s[::0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: slice step cannot be zero
Using 0 as the step gives a ValueError, and that is all there is to it.
Recommendations
Did you know that the Python Standard Library (PSL) has around 6000 usages of sequence slicing, but less
than 500 of those make use of the step parameter? That means that, in the PSL, only around 8.33% of
the slicing operations make use of the step. (Rough figures for the PSL of Python 3.9.2 on my Windows
machine.)
If I had to guess, I would say there are two main reasons that explain why only a “small” percentage of all
the slices make use of the step parameter:
This book is a WIP. Check online to get updates for free. 165
• using a step different from 1 is a very specific operation that only makes sense in few occasions and
depends a lot on how you have structured your data; and
• step parameters other than 1 and -1 make your code much harder to read.
For those reasons, it is recommendable that you do not get overexcited about slices and force your data to
be in such a way that slices are the best way to get to the data.
For example, do not store colour names and their hexadecimal values in an alternating fashion just so that
you can use [::2] and [1::2] to access them. However, if – for some reason – you receive data in this
format, it is perfectly acceptable for you to split the data with two slices:
## Assume we got `colours` in this format from an API or some other place...
>>> colours = ["red", "#ff0000", "green", "#00ff00", "blue", "#0000ff"]
>>> names = colours[::2]
>>> names
['red', 'green', 'blue']
>>> hexs = colours[1::2]
>>> hexs
['#ff0000', '#00ff00', '#0000ff']
Slices with three parameters tend to be dense and hard to parse with your eyes, given that they are enclosed
in [] and then have :: separating the parameters. If you write a slice of the form s[a:b:c], you can expect
the readers of your code to have to pause for a bit and understand what is going on. For that matter, when
you write a long or complex slice, first consider reworking the code so that you don’t have to write a long or
complex slice. But if you do end up writing one, you should probably comment your slice explaining what is
going on.
I had a look at how the Python Standard Library makes use of slicing with three parameters, and I found this
nice example taken from the source code of the dataclasses module:
## From Lib/dataclasses.py, Python 3.9.2
def _process_class(cls, init, repr, eq, order, unsafe_hash, frozen):
# [code deleted for brevity]
This book is a WIP. Check online to get updates for free. 166
>>> s = "Slicing is easy!"
>>> s[-1:0:-1]
'!ysae si gnicil'
Sequence copying
Having taken a look at many different ways to slice and dice sequences, it is now time to mention a very
important nuance about sequence slicing: when we create a slice, we are effectively creating a copy of the
original sequence. This isn’t necessarily a bad thing. For example, there is one idiomatic slicing operation
that makes use of this behaviour.
I brought this up because it is important that you are aware of these subtleties, so that you can make informed
decisions about the way you write your code.
An example of when this copying behaviour might be undesirable is when you have a really large list and you
were considering using a slice to iterate over just a portion of that list. In this case, maybe using the slice
will be a waste of resources because all you want is to iterate over a specific section of the list, and then you
are done; you don’t actually need to have that sublist later down the road.
In this case, what you might want to use is the islice function from the itertools module, that creates
an iterator that allows you to iterate over the portion of the list that you care about.
Iterators are another awesome feature in Python, and I’ll be exploring them in future Pydon’ts, so stay tuned
for that!
A simple way for you to verify that slicing creates copies of the sliced sequences is as follows:
>>> l = [1, 2, 3, 4]
>>> l2 = l
>>> l.append(5) # Append 5 to l...
>>> l2 # ... notice that l2 also got the new 5,
# so l2 = l did NOT copy l.
[1, 2, 3, 4, 5]
>>> l3 = l[2:5] # Slice l into l3.
>>> l3
[3, 4, 5]
>>> l[3] = 42 # Change a value of l...
>>> l
[1, 2, 3, 42, 5] # ... the 4 was replaced by 42...
>>> l3
[3, 4, 5] # ... but l3 still contains the original 4.
This book is a WIP. Check online to get updates for free. 167
examples of immutable sequences are strings.
Slice assignment
Say that l is a list. We are used to “regular” assignment,
>>> l = [1, 2, 3, 4]
and we are used to assigning to specific indices:
>>> l[2] = 30
>>> l
[1, 2, 30, 4]
So how about assigning to slices as well? That is perfectly fine!
>>> l[:2] = [10, 20] # Replace the first 2 elements of l.
>>> l
[10, 20, 30, 4]
>>> l[1::2] = [200, 400] # Swap elements in odd positions.
>>> l
[10, 200, 30, 400]
The two short examples above showed how to replace some elements with the same number of elements.
However, with simpler slices you can also change the size of the original slice:
>>> l = [1, 2, 3, 4]
>>> l[:2] = [0, 0, 0, 0, 0]
>>> l
[0, 0, 0, 0, 0, 3, 4]
When you have a slicing assignment like that, you should read it as “replace the slice on the left with the new
sequence on the right”, so the example above reads “swap the first two elements of l with five zeroes”.
Notice that, if you use “extended slices” (slices with the step parameter), then the number of elements on
the left and on the right should match:
>>> l = [1, 2, 3, 4]
>>> l[::2] # This slice has two elements in it...
[1, 3]
>>> l[::2] = [0, 0, 0, 0, 0] # ... and we try to replace those with 5 elements.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 5 to extended slice of size 2
The fact that you can assign to slices allows you to write some pretty beautiful things, if you ask me.
For example, as I was exploring the Python Standard Library, I came across a slicing assignment gem inside
the urljoin function. urljoin from the urllib.parse module, takes a base path and a relative path, and
tries to combine the two to create an absolute path. Here is an example:
This book is a WIP. Check online to get updates for free. 168
>>> import urllib.parse
>>> urllib.parse.urljoin("https://mathspp.com/blog/", "pydonts/zip-up")
'https://mathspp.com/blog/pydonts/zip-up'
I’m using urllib.parse.urljoin to take the base URL for my blog and stitch that together with a relative
link that takes me to one of the Pydon’ts I have published. Now let me show you part of the source code of
that function:
## From Lib/urllib/parse.py in Python 3.9.2
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
# for rfc3986, ignore all base path should the first character be root.
if path[:1] == '/':
segments = path.split('/')
else:
segments = base_parts + path.split('/')
# filter out elements that would cause redundant slashes on re-joining
# the resolved_path
segments[1:-1] = filter(None, segments[1:-1])
Notice the slice assignment to segments[1:-1]? That segments list contains the different portions of the
two URLs I give the urljoin function, and then the filter function is used to filter out the parts of the URL
that are empty. Let me edit the source code of urljoin to add two print statements to it:
## From Lib/urllib/parse.py in Python 3.9.2
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
# for rfc3986, ignore all base path should the first character be root.
if path[:1] == '/':
segments = path.split('/')
else:
segments = base_parts + path.split('/')
# filter out elements that would cause redundant slashes on re-joining
# the resolved_path
print(segments)
segments[1:-1] = filter(None, segments[1:-1])
print(segments)
Now let me run the same example:
>>> import urllib.parse
This book is a WIP. Check online to get updates for free. 169
>>> urllib.parse.urljoin("https://mathspp.com/blog/", "pydonts/zip-up")
['', 'blog', '', 'pydonts', 'zip-up'] # First `print(segments)`
['', 'blog', 'pydonts', 'zip-up'] # <----- segments has one less '' in it!
'https://mathspp.com/blog/pydonts/zip-up'
We can take the result of the first print and run the filter by hand:
>>> segments = ['', 'blog', '', 'pydonts', 'zip-up']
>>> segments[1:-1]
['blog', '', 'pydonts']
>>> list(filter(None, segments[1:-1]))
['blog', 'pydonts']
>>> segments[1:-1] = filter(None, segments[1:-1])
>>> segments
['', 'blog', 'pydonts', 'zip-up']
So this was a very interesting example usage of slice assignment. It is likely that you won’t be doing some-
thing like this very frequently, but knowing about it means that when you do, you will be able to write that
piece of code beautifully.
Slice deletion
If you can assign to slices, what happens if you assign the empty list [] to a slice?
>>> l = [1, 2, 3, 4]
>>> l[:2] = [] # Replace the first two elements with the empty list.
>>> l
[3, 4]
If you assign the empty list to a slice, you are effectively deleting those elements from the list. You can do
this by assigning the empty list, but you can also use the del keyword for the same effect:
>>> l = [1, 2, 3, 4]
>>> del l[:2]
>>> l
[3, 4]
This book is a WIP. Check online to get updates for free. 170
Even positions and odd positions
A simple slice that you may want to keep on the back of your mind is the slice that lets you access all the
elements in the even positions of a sequence. That slice is [::2]:
>>> l = ["even", "odd", "even", "odd", "even"]
>>> l[::2]
['even', 'even', 'even']
Similarly, l[1::2] gives you the odd positions:
>>> l = ["even", "odd", "even", "odd", "even"]
>>> l[1::2]
['odd', 'odd']l = ["even", "odd", "even", "odd", "even"]
l[1::2]
s[::-1]
A slice with no start and stop parameters and a -1 in the step is a very common slicing pattern. In fact, there
are approximately 100 of these slices in the Python Standard Library, which is roughly one third of all the
slices that make use of the step parameter.
s[::-1] should be read as “the sequence s, but reversed”. Here is a simple example:
>>> s = "Slicing is easy!"
>>> s[::-1]
'!ysae si gnicilS'
What is noteworthy here, and related to the previous remark about slices creating copies, is that sometimes
you don’t want to copy the whole thing to reverse your sequence; for example, if all you want to do is iterate
over the sequence in reverse order. When that is the case, you might want to just use the reversed built-in
function. This function takes a sequence and allows you to iterate over the sequence in reverse order, without
paying the extra memory cost of actually copying the whole sequence.
l[:] or l[::]
If a slice makes a copy, that means that a slice is a very clean way to copy a sequence! The slices [:] and
[::] select whole sequences, so those are primes ways to copy a sequence – for example, a list – when you
really want to create copies.
Deep and shallow copies, the distinction between things that are passed by reference and things that are
passed by value, etc, is a big discussion in itself.
It is easy to search the Python Standard Library for usage examples of this idiom (and for the ones before
as well), so I will just leave you with one, from the argparse module, that contains a helper function named
_copy_items (I deleted its comments):
## From Lib/argparse.py in Python 3.9.2
def _copy_items(items):
if items is None:
return []
This book is a WIP. Check online to get updates for free. 171
if type(items) is list:
return items[:]
import copy
return copy.copy(items)
Notice how the idiom fits in so nicely with the function name: the function says it copies the items. What
does the function do? If the items argument is a list, then it returns a copy of it! So l[:] and l[::] should
be read as “a copy of l”.
This idiom also explains the thumbnail image in the beginning of the article.
del l[:]
Another idiom that makes use of the slice [:], but with something extra, is the idiom to delete the contents
of a list.
Think of l[:] as “opening up l”, and then del l[:] reads “open up l to delete its contents”. This is the
same as doing l[:] = [] but it is not the same as doing l = [] nor is it the same as doing del l.
It is easy to see why del l is different from the others: del l means that the name l is no longer in use:
>>> l = [1, 2, 3, 4]
>>> del l
>>> l
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'l' is not defined
whereas the idiom just clears the list up:
>>> l = [1, 2, 3, 4]
>>> del l[:]
>>> l
[]
What might be trickier to understand is why del l[:] and l[:] = [] are different from l = []. I’ll show
you an example that shows they are clearly different, and then I will leave it up to you to decide whether or
not you want to burn enough neurons to understand what is going on.
First, let me use l[:] = ...
>>> l = l_shallow = [1, 2, 3]
>>> l_shallow is l
True
>>> j = []
>>> l[:] = j
>>> l
[]
>>> l_shallow
[]
This book is a WIP. Check online to get updates for free. 172
>>> l is j
False
>>> l_shallow is l
True
and now let me compare it with l = ...
>>> l = l_shallow = [1, 2, 3]
>>> l_shallow is l
True
>>> j = []
>>> l = j
>>> l
[]
>>> l_shallow
[1, 2, 3]
>>> l is j
True
>>> l_shallow is l
False
You can see above that the results of comparisons like l is j and l_shallow is l, as well as the contents
of l_shallow, change in the two examples. Therefore, the two things cannot be the same. What is going on?
Well, deep and shallow copies, and references to mutable objects, and the like, are at fault! I’ll defer a more
in-depth discussion of this for a later Pydon’t, as this one has already become quite long.
Just remember, l[:] = [] and del l[:] can be read as “delete the contents of l”.
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Slices are really powerful and they are an essential tool to master for when you work with se-
quences, like strings and lists.”
This Pydon’t showed you that:
• slices can have a step parameter that allows to skip elements of the sequence;
• the default value of the step parameter is 1;
• a negative step allows to pick elements from the end of the sequence to the start;
• when using a negative step, the start parameter should refer to an element of the sequence that is to
the right of the element referred to by the stop parameter;
• there is a parallelism between slices (with negative steps) and the built-in range function;
• 0 is not a valid step parameter for a slice;
• slices are more common with just the start and stop parameters, in part because slices with
[start:stop:step] can be really hard to read;
• slices create copies of the parts of the sequences we are looking at, so you have to be mindful of that
when memory is constrained;
• you can assign to slices of mutable objects, like lists;
This book is a WIP. Check online to get updates for free. 173
• when assigning to a slice, the final length of the sequence might change if we use a simple slice on the
left (without the step parameter) and if the sequence on the right has a different number of elements;
• you can use the del keyword to delete slices of mutable sequences, or you can also assign the empty
sequence to those slices for the same effect;
• there are some interesting idiomatic slices that you should be aware of:
– s[::2] and s[1::2] are “elements in even positions of s” and “elements in odd positions of s”,
respectively;
– s[::-1] is “s, but reversed”;
– l[:] and l[::] are “a copy of l”; and
– del l[:] is “delete the contents of l” or “empty l”, which is not the same as doing l = [].
References
• Python 3 Documentation, The Python Language Reference, Expressions – Slicings, https://docs.pytho
n.org/3/reference/expressions.html#slicings [last acessed 20-04-2021];
• Python 3 Documentation, The Python Language Reference, Data Model – The Standard Type Hierarchy,
https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy [last accessed
20-04-2021];
• Python 3 Documentation, The Python Language Reference, Data Model – Emulating Container Types,
https://docs.python.org/3/reference/datamodel.html#emulating-container-types [last accessed
20-04-2021];
• Python 3 Documentation, The Python Language Reference, Built-in Functions, slice, https://docs.pyt
hon.org/3/library/functions.html#slice [last accessed 20-04-2021];
• Python 3 Documentation, The Python Language Reference, Built-in Functions, range, https://docs.pyt
hon.org/3/library/functions.html#func-range [last accessed 03-05-2021];
• Python 3 Documentation, The Python Language Reference, Built-in Functions, filter, https://docs.p
ython.org/3/library/functions.html#filter [last accessed 11-05-2021];
• Python 3 Documentation, The Python Language Reference, Built-in Functions, reversed, https://docs
.python.org/3/library/functions.html#reversed [last accessed 11-05-2021];
• Python 3 Documentation, The Python Standard Library, dataclasses, https://docs.python.org/3/librar
y/dataclasses.html [11-05-2021];
• Python 3 Documentation, The Python Standard Library, itertools, islice, https://docs.python.org/3/
library/itertools.html#itertools.islice [11-05-2021];
• Python 3 Documentation, The Python Standard Library, urllib.parse, urljoin, https://docs.python.or
g/3/library/urllib.parse.html#urllib.parse.urljoin [11-05-2021];
• “Effective Python – 90 Specific Ways to Write Better Python”; Slatkin, Brett; ISBN 9780134853987;
• Stack Overflow, “Why would I want to use itertools.islice instead of normal list slicing?”, https://stacko
verflow.com/q/32172612/2828287 [last accessed 10-05-2021].
This book is a WIP. Check online to get updates for free. 174
Inner workings of sequence slicing
175
Introduction
We have written two Pydon’ts already on sequence slicing:
1. “Idiomatic sequence slicing”; and
2. “Mastering sequence slicing”.
Those two Pydon’ts taught you almost everything there is to know about sequence slicing, but there is some-
thing that we will only take a look at today:
• uncovering the two layers of syntactic sugar surrounding sequence slicing; and
• seeing how to implement slicing for your custom objects.
If you don’t really know how sequence slicing works, you might want to take a look at the Pydon’ts I linked
above. In particular, the Pydon’t on mastering sequence slicing can really help you take your Python slicing
skills to the next level.
Without further ado, let us begin!
Slicing parameters
If we read the docs, or if we play around with the slice built-in enough, we find out that this object stores
the slicing parameters that we repeatedly talked about in the previous Pydon’ts. These parameters are the
start, stop, and step, parameters of the slice, and the docs tell us that we can access them:
>>> sl = slice(1, 12, 3)
>>> sl.start
1
>>> sl.stop
12
>>> sl.step
3
However, we cannot modify them:
This book is a WIP. Check online to get updates for free. 176
>>> sl = slice(None, 3, None)
>>> print(sl.start)
None
>>> sl.start = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: readonly attribute
This book is a WIP. Check online to get updates for free. 177
Notice how, in the example above, we use None, when creating a slice object, in order to specify an implicit
slicing parameter, such as the omitted stop parameter in the slice s[2::3], that would go between the two
colons.
By the way, careful with naming your slice objects! The most obvious name is slice, but if you create a
slice with that name then you will have a hard time creating other slice objects because you will overwrite
the name of the built-in type. This is also why you shouldn’t name your strings str or your integers int.
This book is a WIP. Check online to get updates for free. 178
... # Just let the built-in string handle indexing:
... return super().__getitem__(idx)
...
>>> s = S("Slicing is easy!")
>>> s[3]
The argument was: 3
'c'
>>> s[1::2]
The argument was: slice(1, None, 2)
'lcn ses!'
As you can see above, we tried slicing the string with s[1::2] and that was converted to slice(1, None,
2) by the time it got to the __getitem__ call!
This shows the two bits of syntactic sugar going on: using the colon syntax for slices, start:stop:step, is
just syntactic sugar for creating an explicit slice object, and using brackets [] to index/slice is just syntactic
sugar for a call to the __getitem__ function:
>>> s = "Slicing is easy!"
>>> s[1::3]
'li s'
>>> s.__getitem__(slice(1, None, 3))
'li s'
This shows that you can use indexing/slicing in your own custom objects if you implement the __getitem__
method for your own objects. I will show you an example of this below.
This book is a WIP. Check online to get updates for free. 179
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
Python complained, but not about the syntax. It is strings that cannot handle the indices, and the extra slice,
that you gave to the __getitem__ setting. Compare this with an actual SyntaxError:
>>> for in range(10):
File "<stdin>", line 1
for in range(10):
^
SyntaxError: invalid syntax
I couldn’t even change lines to continue my make-believe for loop, Python outright complained about the
syntax being wrong.
However, in your custom objects, you can add support for multiple indexing/slicing:
>>> class Seq:
... def __getitem__(self, idx):
... print(idx)
...
>>> s = Seq()
>>> s[1, 2, 3, 4:16:2]
(1, 2, 3, slice(4, 16, 2))
As you can see, the multiple indices and slices get packed into a tuple, which is then passed in to
__getitem__.
We have taken a look at how slices work under the hood, and also took a sneak peek at how regular indexing
works, and now we will go through a couple of examples in code where these things could be helpful.
Examples in code
Bear in mind that it is likely that you won’t be using explicit slice objects in your day-to-day code. The
scarcity of usage examples of slice in the Python Standard Library backs my claim.
Most usages of slice I found were for testing other objects’ implementations, and then I found a couple
(literally two) usages in the xml module, but to be completely honest with you, I did not understand why they
were being used! (Do let me know if you can explain to me what is happening there!)
itertools.islice
The first example we will be using is from the itertools module’s islice function. The islice function
can be used to slice into an iterator, much like regular slicing, with two key differences:
• islice does not work with negative parameters; and
• islice works with generic iterables, which is the main reason why islice is useful.
This book is a WIP. Check online to get updates for free. 180
Iterables and generators are fascinating things in Python and there will be future Pydon’ts on this subject.
Stay tuned for those.
Without going into too much detail about the iterables, let me show you a clear example of when regular
slicing doesn’t work but islice works:
>>> f = lambda x: x # function that returns its input.
>>> f(3)
3
>>> f([1, 2, "Hey"])
[1, 2, 'Hey']
>>> s = "Slicing is easy!"
>>> s[2::3]
'iniey'
>>> m = map(f, s) # `m` is an iterable with the characters from `s`.
>>> m[2::3] # regular slicing doesn't work...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'map' object is not subscriptable
>>> import itertools
>>> for char in itertools.islice(m, 2, None, 3):
... print(char)
...
i
n
i
e
y
The example above just shows that islice works in some situations where regular slicing with
[start:stop:step] doesn’t. The documentation for islice provides an approximate Python imple-
mentation of islice (the actual function is written in C):
## From https://docs.python.org/3/library/itertools.html#itertools.islice,
## accessed on the 18th of May 2021
def islice(iterable, *args):
# (Some comments removed for brevity...)
s = slice(*args)
start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
it = iter(range(start, stop, step))
# (Code sliced for brevity, pun much intended.)
# ...
In the example above, the slice object is being used just as an utility to map the arguments given to islice
as the parameters that need to go into the range in the third code line of the example.
Another noteworthy thing is the line that assigns to start, stop, step with the or operators. The or is
being used to assign default values to the parameters, in case the original argument as None:
>>> start = 4 # If `start` has a value,
This book is a WIP. Check online to get updates for free. 181
>>> start or 0 # then we get that value.
4
>>> start = None # However, if `start` is `None`,
>>> start or 0 # then we get the default value of `0`.
0
## Similarly for the `stop` and `step` parameters;
## here is another example with `stop`:
>>> import sys
>>> stop = 4
>>> stop or sys.maxsize
4
>>> stop = None
>>> stop or sys.maxsize
9223372036854775807
The short-circuiting capabilities of the or operator (and also of the and) will be discussed in detail in a later
Pydon’t, don’t worry!
To conclude this example, we see that slice can be useful in the niche use-case of dispatching range-like
arguments to their correct positions, because you can read the parameters off of a slice object.
def __str__(self):
return f"GeometricProgression({self.start}, {self.ratio})"
gp = GeometricProgression(1, 3)
print(gp) # prints GeometricProgression(1, 3)
Now, geometric progressions have infinite terms, so we cannot really just generate “all terms” of the pro-
gression and return them in a list or something like that, so if we want to support indexing and/or slicing, we
need to do something else… We need to implement __getitem__!
Let us implement __getitem__ in such a way that it returns a list with all the elements that the user tried to
fetch:
This book is a WIP. Check online to get updates for free. 182
import sys
class GeometricProgression:
def __init__(self, start, ratio):
self.start = start
self.ratio = ratio
def __str__(self):
return f"GeometricProgression({self.start}, {self.ratio})"
gp = GeometricProgression(1, 3)
print(gp[0]) # prints 1
print(gp[1]) # prints 3
print(gp[2]) # prints 9
print(gp[0:3]) # prints [1, 3, 9]
print(gp[1:10:3]) # prints [3, 81, 2187]
As you can see, our implementation already supports slicing and indexing, but we can take this just a little
bit further, and add support for multiple indices/slices with ease:
import sys
class GeometricProgression:
def __init__(self, start, ratio):
self.start = start
self.ratio = ratio
def __str__(self):
return f"GeometricProgression({self.start}, {self.ratio})"
This book is a WIP. Check online to get updates for free. 183
def __getitem__(self, idx):
if isinstance(idx, int):
return self.nth(idx)
elif isinstance(idx, slice):
start, stop, step = idx.start or 0, idx.stop or sys.maxsize, idx.step or 1
return [self.nth(n) for n in range(start, stop, step)]
elif isinstance(idx, tuple):
return [self.__getitem__(sub_idx) for sub_idx in idx]
else:
raise TypeError("Geo. progression indices should be integers or slices.")
gp = GeometricProgression(1, 3)
print(gp[0, 1, 4]) # prints [1, 3, 81]
print(gp[0:2, 0:2, 1, 0:2]) # prints [[1, 3], [1, 3], 3, [1, 3]]
And that is it, this shows you a (simple) working example of how you could define indexing and slicing into
your own objects.
You can find this simple implementation on GitHub, in case you need it.
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Sequence slicing hides two layers of syntactic sugar for you, but you do need to know about
them if you want to write custom objects that support indexing and/or slicing.”
This Pydon’t showed you that:
• there is a built-in slice type in Python;
• the syntax [start:stop:step] is just syntactic sugar for slice(start, stop, step);
• slice(start, stop, step) represents the indices of range(start, stop, step);
• when you use seq[] to index/slice into seq, you actually call the __getitem__ method of seq;
• __getitem__, __setitem__, and __delitem__, are the three methods that you would need in custom
objects to emulate indexing, indexing assignment and indexing deletion;
• Python syntax allows for multiple indices/slices separated by commas;
• itertools.islice can be used with iterables, whereas plain slicing cannot; and
• it can be fairly straightforward to implement (multiple) indexing/slicing for your own objects.
References
• Python 3 Documentation, The Python Language Reference, Expressions – Slicings, https://docs.pytho
n.org/3/reference/expressions.html#slicings [last acessed 18-05-2021];
• Python 3 Documentation, The Python Language Reference, Data Model – The Standard Type Hierarchy,
https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy [last accessed
20-04-2021];
This book is a WIP. Check online to get updates for free. 184
• Python 3 Documentation, The Python Language Reference, Data Model – Emulating Container Types,
https://docs.python.org/3/reference/datamodel.html#emulating-container-types [last accessed
18-05-2021];
• Python 3 Documentation, The Python Language Reference, Data Model – Emulating Container Types,
__getitem__, https://docs.python.org/3/reference/datamodel.html#object.__getitem__ [last accessed
18-05-2021];
• Python 3 Documentation, The Python Language Reference, Built-in Functions, slice, https://docs.pyt
hon.org/3/library/functions.html#slice [last accessed 18-05-2021];
• Python 3 Documentation, The Python Standard Library, itertools, islice, https://docs.python.org/3/
library/itertools.html#itertools.islice [18-05-2021];
• Stack Overflow, “Why would I want to use itertools.islice instead of normal list slicing?”, https://stacko
verflow.com/q/32172612/2828287 [last accessed 18-05-2021].
This book is a WIP. Check online to get updates for free. 185
Boolean short-circuiting
Introduction
In this Pydon’t we will take a closer look at how and and or really work and at a couple of really neat things
you can do because of the way they are defined. In particular, we will look at
• the fact that and and or return values from their operands, and not necessarily True or False;
• what “short-circuiting” is and how to make the best use of it;
• how short-circuiting in and and or extends to all and any; and
• some expressive use-cases of Boolean short-circuiting.
For this Pydon’t, I will assume you are familiar with what “Truthy” and “Falsy” values are in Python. If you are
186
not familiar with this concept, or if you would like just a quick reminder of how this works, go ahead and read
the “Truthy, Falsy, and bool” Pydon’t.
This book is a WIP. Check online to get updates for free. 187
Take your time to explore this for a bit, just like we explored x or y above.
Short-circuiting
You might be asking why this distinction is relevant. It is mostly relevant because of the following property:
and and or only evaluate the right operand if the left operand is not enough to determine the result of the
operation. This is what short-circuiting is: not evaluating the whole expression (stopping short of evaluating
it) if we already have enough information to determine the final outcome.
This short-circuiting feature, together with the fact that the boolean operators and and or return the values
of the operands and not necessarily a Boolean, means we can do some really neat things with them.
or
False or y
or evaluates to True if any of its operands is truthy. If the left operand to or is False (or falsy, for that matter)
then the or operator has to look to its right operand in order to determine the final result.
Therefore, we know that an expression like
val = False or y
will have the value of y in it, and in an if statement or in a while loop, it will evaluate the body of the construct
only if y is truthy:
>>> y = 5 # truthy value.
>>> if False or y:
... print("Got in!")
... else:
... print("Didn't get in...")
...
Got in!
>>> y = [] # falsy value.
>>> if False or y:
... print("Got in 2!")
... else:
... print("Didn't get in 2...")
...
Didn't get in 2...
Let this sit with you: if the left operand to or is False or falsy, then we need to look at the right operand to
determine the value of the or.
True or y
On the other hand, if the left operand to or is True, we do not need to take a look at y because we already
know the final result is going to be True.
This book is a WIP. Check online to get updates for free. 188
Let us create a simple function that returns its argument unchanged but that produces a side-effect of
printing something to the screen:
def p(arg):
print(f"Inside `p` with arg={arg}")
return arg
Now we can use p to take a look at the things that Python evaluates when trying to determine the value of x
or y:
>>> p(False) or p(3)
Inside `p` with arg=False
Inside `p` with arg=3
3
>>> p(True) or p(3)
Inside `p` with arg=True
True
Notice that, in the second example, p only did one print because it never reached the p(3).
Short-circuiting of or expressions
Now we tie everything together. If the left operand to or is False or falsy, we know that or has to look at its
right operand and will, therefore, return the value of its right operand after evaluating it. On the other hand,
if the left operand is True or truthy, or will return the value of the left operand without even evaluating the
right operand.
and
We now do a similar survey, but for and.
False and y
and gives True if both its operands are True. Therefore, if we have an expression like
val = False and y
do we need to know what y is in order to figure out what val is? No, we do not, because regardless of whether
y is True or False, val is always False:
>>> False and True
False
>>> False and False
False
If we take the False and y expressions from this example and compare them with the if expression we
wrote earlier, which was
(x and y) == (x if not x else y)
we see that, in this case, x was substituted by False, and, therefore, we have
This book is a WIP. Check online to get updates for free. 189
(False and y) == (False if not False else y)
Now, the condition inside that if expression reads
not False
which we know evaluates to True, meaning that the if expression never returns y.
If we consider any left operand that can be False or falsy, we see that and will never look at the right operand:
>>> p([]) and True # [] is falsy
Inside `p` with arg=[]
[]
>>> p(0) and 3242 # 0 is falsy
Inside `p` with arg=0
0
>>> p({}) and 242 # {} is falsy
Inside `p` with arg={}
{}
>>> p(0) and p(0) # both are falsy, but only the left matters
Inside `p` with arg=0
0
True and y
Now, I invite you to take a moment to work through the same reasoning, but with expressions of the form
True and y. In doing so, you should figure out that the result of such an expression is always the value of y,
because the left operand being True, or any other truthy value, doesn’t give and enough information.
This book is a WIP. Check online to get updates for free. 190
all and any
The built-in functions all and any also short-circuit, as they are simple extensions of the behaviours provided
by and and or, respectively.
all wants to make sure that all the values of its argument are truthy, so as soon as it finds a falsy value, it
knows it’s game over. That’s why the docs say all is equivalent to the following code:
def all(it):
for elem in it:
if not elem:
return False
return True
Similarly, any is going to do its best to look for some value that is truthy. Therefore, as soon as it finds one,
any knows it has achieved its purpose and does not need to evaluate the other values.
Can you write an implementation of any that is similar to the above implementation of all and that also
short-circuits?
Examples in code
Now that we have taken a look at how all of these things work, we will see how to put them to good use in
actual code.
This book is a WIP. Check online to get updates for free. 191
Conditionally creating a text file
Consider this example that should help me get my point across: imagine you are writing a function that
creates a helper .txt file but only if it is a .txt file and if it does not exist yet.
With this preamble, your function needs to do two things: - check the suffix of the file is .txt; - check if the
file exists in the filesystem.
What do you feel is faster? Checking if the file ends in .txt or looking for it in the whole filesystem? I would
guess checking for the .txt ending is simpler, so that’s the expression I would put first in the code:
import pathlib
def create_txt_file(filename):
path = pathlib.Path(filename)
if filename.suffix == ".txt" and not path.exists():
# Create the file but leave it empty.
with path.open():
pass
This means that, whenever filename does not respect the .txt format, the function can exist right away and
doesn’t even need to bother the operating system with asking if the file exists or not.
This book is a WIP. Check online to get updates for free. 192
>>> enc = base64.b64encode(s)
>>> enc
b'QmFzZSA2NCBlbmNvZGluZyBhbmQgZGVjb2Rpbmcu'
>>> base64.b64decode(enc)
b'Base 64 encoding and decoding.'
Now, look at the if statement that I marked with a comment:
if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
pass
validate is an argument to b64decode that tells the function if we should validate the string that we want
to decode or not, and then the re.fullmatch() function call does that validation, ensuring that the string
to decode only contains valid base 64 characters. In case we want to validate the string and the validation
fails, we enter the if statement and raise an error.
Notice how we first check if the user wants to validate the string and only then we run the regular expression
match. We would obtain the exact same result if we changed the order of the operands to and, but we would
be spending much more time than needed.
To show that, let us try both cases! Let’s build a string with 1001 characters, where only the last one is invalid.
Let us compare how much time it takes to run the boolean expression with the regex validation before and
after the Boolean validate.
import timeit
## Code that sets up the variables we need to evaluate the expression that we
## DO NOT want to be taken into account for the timing.
setup = """
import re
s = b"a"*1000 + b"*"
validate = False
"""
This book is a WIP. Check online to get updates for free. 193
Conditional validation
A typical usage pattern is when we want to do some validation if certain conditions are met.
Keeping the previous b64decode example in mind, that previous if statement could’ve been written like so:
## Modified from Lib/base64.py in Python 3.9.2
def b64decode(s, altchars=None, validate=False):
"""Decode the Base64 encoded bytes-like object or ASCII string s.
[docstring cut for brevity]
"""
s = _bytes_from_decode_data(s)
if altchars is not None:
altchars = _bytes_from_decode_data(altchars)
assert len(altchars) == 2, repr(altchars)
s = s.translate(bytes.maketrans(altchars, b'+/'))
# Do we want to validate the string?
if validate: # <--
# Is the string valid?
if not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s): # <--
raise binascii.Error('Non-base64 digit found')
return binascii.a2b_base64(s)
Now we took the actual validation and nested it, so that we have two separate checks: one tests if we
need to do validation and the other one does the actual validation. What is the problem with this? From a
fundamentalist’s point of view, you are clearly going against the Zen of Python, that says
“Flat is better than nested.”
But from a practical point of view, you are also increasing the vertical space that your function takes up by
having a ridiculous if statement hang there. What if you have multiple conditions that you need to check
for? Will you have a nested if statement for each one of those?
This is exactly what short-circuiting is useful for! Only running the second part of a Boolean expression if it
is relevant!
This book is a WIP. Check online to get updates for free. 194
Can be a fixed string of any length, an integer, or None.
"""
if isinstance(term, str) and self.use_encoding:
term = bytes(term, self.encoding)
elif isinstance(term, int) and term < 0:
raise ValueError('the number of received bytes must be positive')
self.terminator = term
This is a helper function from within the asynchat module. We don’t need to know what is happening outside
of this function to understand the role that short-circuiting has in the elif statement. If the term variable is
smaller than 0, then we want to raise a ValueError to complain, but the previous if statement shows that
term might also be a string. If term is a string, then comparing it with 0 raises another ValueError, so what
we do is start by checking a necessary precondition to term < 0: term < 0 only makes sense if term is an
integer, so we start by evaluating isinstance(term, int) and only then running the comparison.
Let me show you another example from the enum module:
## From Lib/enum.py in Python 3.9.2
def _create_(cls, class_name, names, *, module=None, qualname=None, type=None, start=1):
"""
Convenience method to create a new Enum class.
"""
# [cut for brevity]
This book is a WIP. Check online to get updates for free. 195
## we only need to take a look at the right-hand side of this `and` if `names`
## is either a tuple or a list.
This book is a WIP. Check online to get updates for free. 196
raise ValueError("Empty names..? :(")
If this is a silly exercise for you, sorry about that! I just want you to be aware of the fact that when you have
many Boolean conditions, you need to be careful when checking specific configurations of what is True and
what is False.
'''
self.maps = list(maps) or [{}] # always at least one map
This ChainMap object allows you to combine multiple mappings (for example, dictionaries) into a single
mapping that combines all the keys and values.
This book is a WIP. Check online to get updates for free. 197
>>> import collections
>>> a = {"A": 1}
>>> b = {"B": 2, "A": 3}
>>> cm = collections.ChainMap(a, b)
>>> cm["A"]
1
>>> cm["B"]
2
The assignment that we see in the source code ensures that self.maps is a list of, at least, one empty
mapping. If we give no mapping at all to ChainMap, then list(maps) evaluates to [], which is falsy, and
forces the or to look at its right operand, returning [{}]: this produces a list with a single dictionary that has
nothing inside.
This book is a WIP. Check online to get updates for free. 198
>>> append(5)
[5, 5]
>>> append(5)
[5, 5, 5]
Notice the three consecutive calls append(5). We would expect the three calls to behave the same way, but
because a list is a mutable object, the three consecutive calls to append add the values to the default value
itself, that started out as an empty list but keeps growing.
I’ll write about mutability in more detail in future Pydon’ts, so be sure to subscribe to not miss that future
Pydon’t.
This book is a WIP. Check online to get updates for free. 199
## Prints 'Found odd number 35.'
Isn’t this neat?
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Be mindful when you order the left and right operands to the and and or expressions, so that
you can make the most out of short-circuiting.”
This Pydon’t showed you that:
• and and or return the value of one of its operands, and not necessarily a Boolean value;
• both Boolean operators short-circuit:
– and only evaluates the right operand if the left operand is truthy;
– or only evaluates the right operand if the left operand is falsy;
• the built-in functions all and any also short-circuit;
• short-circuiting also happens in chained comparisons, because those contain an implicit and operator;
• using short-circuiting can save you a lot of computational time;
• nested structures of if statements can, sometimes, be flattened and simplified if we use short-
circuiting with the correct ordering of the conditions;
• it is customary to use short-circuiting to test some preconditions before applying a test to a variable;
• another great use-case for short-circuiting is to assign default values to variables and function argu-
ments, especially if the default value is a mutable value; and
• short-circuiting, together with the walrus operator :=, can be used to find a witness value with respect
to a predicate function.
References
• Python 3 Documentation, The Python Standard Library, Built-in Types, Boolean Operations – and, or,
not, https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not [last accessed
31-05-2021];
• Python 3 Documentation, The Python Language Reference, Built-in Functions, all, https://docs.pytho
n.org/3/library/functions.html#all [last accessed 26-05-2021];
• Python 3 Documentation, The Python Language Reference, Built-in Functions, any, https://docs.pytho
n.org/3/library/functions.html#any [last accessed 26-05-2021];
• Stack Overflow, “Does Python support short-circuiting”, https://stackoverflow.com/a/14892812/28282
87 [last accessed 31-05-2021];
• Python 3 Documentation, The Python Standard Library, base64, https://docs.python.org/3/library/base
64.html [last accessed 01-06-2021];
• Python 3 Documentation, The Python Standard Library, asynchat, https://docs.python.org/3/library/as
ynchat.html [last accessed 01-06-2021];
• Python 3 Documentation, The Python Standard Library, enum, https://docs.python.org/3/library/enum.h
tml [last accessed 01-06-2021];
This book is a WIP. Check online to get updates for free. 200
• Python 3 Documentation, The Python Standard Library, collections.ChainMap, https://docs.python.
org/3/library/collections.html#collections.ChainMap [last accessed 01-06-2021];
• Python 3 Documentation, The Python Standard Library, cgitb, https://docs.python.org/3/library/cgitb.
html [last accessed 01-06-2021];
• Real Python, “How to Use the Python or Operator”, https://realpython.com/python-or-operator/ [last
accessed 01-06-2021];
This book is a WIP. Check online to get updates for free. 201
The power of reduce
Introduction
In this Pydon’t I’ll talk about reduce, a function that used to be a built-in function and that was moved to the
functools module with Python 3.
Throughout all of the Pydon’ts I have been focusing only on Python features that you can use without having
to import anything, so in that regard this Pydon’t will be a little bit different.
In this Pydon’t, you will:
• see how reduce works;
• learn about the relationship between reduce and for loops;
202
• notice that reduce hides in a handful of other built-in functions we all know and love;
• learn about a neat use-case for reduce;
This book is a WIP. Check online to get updates for free. 203
def sum(iterable, start=0):
acc = start
for elem in iterable:
acc = acc + elem
return acc
Now, our sum function can start adding up at a different value and we use the operator.add function instead
of using + or modified assignment +=. Let us now stack this alternative implementation side by side with the
original reduce implementation:
def sum(iterable, start=0): # def reduce(function, iterable, initial_value):
acc = start # result = initial_value
for elem in iterable: # for value in iterable:
acc = acc + elem # result = function(result, value)
return acc # return result
Can you see how they are the same thing?
This book is a WIP. Check online to get updates for free. 204
Other common reductions
And there is more, of course. If we use operator.mul (for multiplication), then we get the math.prod function
that we can use to multiply all the numbers in an iterable:
>>> from math import prod
>>> prod(range(1, 11)) # 10!
3628800
>>> reduce(operator.mul, range(1, 11))
3628800
What if you have a bunch of strings that you want to piece together? For example, what if you have a list of
words that you want to put back together, separated by spaces?
>>> words = ["Do", "I", "like", "reductions?"]
>>> " ".join(words)
'Do I like reductions?'
If we define “string addition” to be the concatenation of the two strings, but with a space in the middle, then
we get the same thing:
>>> reduce(lambda s1, s2: s1 + " " + s2, words)
'Do I like reductions?'
Now, please don’t get me wrong. I am not suggesting you start using reduce when you need to join strings.
I am just trying to show you how these patterns are so common and appear in so many places, even if you
don’t notice them.
Why bother?
Why should you bother with knowing that reduce exists, and how it works? Because that is what “learning
Python” means: you need to be exposed to the library, to the built-ins, you need to learn new algorithms, new
ways of doing things, new tools.
reduce is another tool you now have in your toolbelt. Maybe it is not something you will use every day. Maybe
it is something you will use once a year. Or even less. But when the time comes, you can use it, and your
code will be better for that: because you know how to use the right tool for the job.
People learn a lot by building knowledge on top of the things that they already learned elsewhere… And the
more you learn elsewhere, the more connections with different things you can make, and the more things you
can discover. Maybe this article does nothing for you, but maybe this article was the final push you needed
to help something else click. Or maybe it feels irrelevant now, but in 1 week, 1 month, or 1 year, something
else will click because you took the time to learn about reduce and to understand how it relates to all these
other built-in functions.
Far-fetched reductions
The reductions above were reductions that are more “normal”, but we can do all kinds of interesting things
with reduce! Skip this section altogether if you are starting to feel confused or repulsed by reductions, I
This book is a WIP. Check online to get updates for free. 205
don’t want to damage your relationship with reduce beyond repair. This section contains some reductions
that are – well, how to put this nicely..? – that are not necessarily suitable for production.
This book is a WIP. Check online to get updates for free. 206
The identity element…
…or lack thereof
We have seen some reductions already and, if you were brave enough, you even took a sneak peek at some
crazy reductions in the previous section. However, up until now, I have been (purposefully) not giving much
attention to the third argument to reduce. Let us discuss it briefly.
First, why do we need a third argument to reduce? Well… because we like things to work:
>>> from functools import reduce
>>> import operator
>>> sum([1, 2])
3
>>> reduce(operator.add, [1, 2])
3
>>> sum([1])
1
>>> reduce(operator.add, [1])
1
>>> sum([])
0
>>> reduce(operator.add, [])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: reduce() of empty sequence with no initial value
From a strictly practical point of view, the third argument to reduce exists so that reduce can know what to
return in case the given iterable is empty. This means that, in general, you don’t need to worry about that
argument if you know your iterables are never going to be empty…
The documentation is quite clear with regards to how it uses this third argument, to which they refer as
initializer:
“If the optional initializer is present, it is placed before the items of the iterable in the cal-
culation, and serves as a default when the iterable is empty. If initializer is not given and
iterable contains only one item, the first item is returned.” [functools.reduce Python 3 docs,
8th June 2021].
So, in practical terms, you only really need the initializer when the iterable is empty, and therefore you
should use it when it might happen that you pass an empty iterable into reduce.
This book is a WIP. Check online to get updates for free. 207
Again, from a very practical perspective, the identity element is a special element with a very special beha-
viour: the identity element is such that, if the iterable is not empty, having the identity element or not should
be exactly the same thing. In other words, when in the presence of other values, the identity element should
have no effect at all.
For example, if we are multiplying a list of numbers, what is the identity element that we should feed reduce
with? What is the number that, when multiplied by some other numbers, does exactly nothing? It is 1:
>>> from functools import reduce
>>> reduce(operator.mul, range(4, 10))
60480
>>> reduce(operator.mul, range(4, 10), 1)
60480
For the built-in reductions, you can generally figure out what the identity element is by trying to call the
reduction with an empty iterable:
>>> sum([])
0
>>> import math
>>> math.prod([])
1
>>> all([])
True
>>> any([])
False
>>> max([])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence
max and min are interesting reductions because, from the mathematical point of view, they have suitable
identity elements:
• for max, the identity element is -∞; and
• for min, the identity element is ∞.
Why is that? Again, because these are the values that will not impact the final result when mixed in with
other numbers.
Take a look at the following excerpt from my session:
>>> max(float("-inf"), 10)
10
>>> max(float("-inf"), -132515632534250)
-132515632534250
>>> max(float("-inf"), 67357321)
67357321
These six lines of the session show three instances of how calling max with minus infinity as one of the
arguments always returns the other one, because no number is smaller than minus infinity.
This book is a WIP. Check online to get updates for free. 208
However, max and min will throw an error if you call them with empty iterables, even though there is an identity
element that you could use.
>>> max([])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence
Maybe they do this so that people don’t have to deal with infinities in their programs? I honestly don’t know!
Edit: I went online and asked people, and the answer that made the most sense to me is that max and min
can be used with any comparable objects, and for other objects, the infinities might make absolutely no
sense.
For example, max("abc", "da") returns "da", and when comparing strings it really makes no sense to add
float("-inf") to the mix.
Examples in code
I looked for usages of reduce in the Python Standard Library and I didn’t find many, but I found one usage
pattern (in two different places) and I just found it to be really elegant, and that’s what I am sharing with you
here.
This book is a WIP. Check online to get updates for free. 209
Other than that, even if you are not explicitly using reduce, just remember that functions like sum, math.prod,
max, min, all, any, etc, are pervasive in our code and, whether you like it or not, you are using reductions in
your own code.
This book is a WIP. Check online to get updates for free. 210
>>> class C:
... pass
...
Now, let me create a couple of instances and nest them:
>>> c = C()
>>> c.one = C()
>>> c.one._2 = C()
>>> c.one._2.c = C()
>>> c.one._2.c._4 = 42
If I have the base instance c, and if I have the names of the successive attributes that lead to 42, how do I
get there? Well, instead of using dict.get, we can use getattr:
>>> attrs = ["one", "_2", "c", "_4"]
>>> reduce(getattr, attrs, c)
42
I’ll be writing about getattr soon, so be sure to subscribe to stay tuned.
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Reductions are classical techniques that you use frequently, even if you do not realise you are
doing so!”
This Pydon’t showed you that:
• reduce takes an iterable of objects and applies a function successively, to build a single final object;
• reduce was a built-in function in Python 2 and in Python 3 it lives in the functools module;
• reductions can be converted to for loops and back following a very well-defined pattern;
• built-in functions like sum, max, min, any, and all, are reductions;
• a reduction can work with an optional third argument, to initialise the process, and that element is
supposed to be the identity element of the function you are using;
• not all functions have identity elements;
• the operator module allows you to access built-in operations, like addition and subtraction, and pass
them around your code; and
• reduce can be used to reach programmatically inside nested dictionaries or class attributes.
References
• Python 3 Documentation, The Python Standard Library, Built-in Functions, https://docs.python.org/3/li
brary/functions.html [last accessed 07-06-2021];
• Python 2 Documentation, The Python Standard Library, Built-in Functions, reduce, https://docs.python.
org/2.7/library/functions.html#reduce [last accessed 06-06-2021];
• Python 3 Documentation, The Python Standard Library, functools.reduce, https://docs.python.org/3/
library/functools.html#functools.reduce [last accessed 06-06-2021];
This book is a WIP. Check online to get updates for free. 211
• Python 3 Documentation, The Python Standard Library, operator, https://docs.python.org/3/library/op
erator.html [last accessed 07-06-2021];
• Artima Weblogs, “The fate of reduce() in Python 3000” by Guido van Rossum, https://www.artima.c
om/weblogs/viewpost.jsp?thread=98196 [last accessed 06-06-2021];
• Real Python, “Python’s reduce(): From Functional to Pythonic Style”, https://realpython.com/python-
reduce-function/ [last accessed 06-06-2021];
• Stack Overflow, “Why don’t max and min return the appropriate infinities when called with empty iter-
ables?”, https://stackoverflow.com/q/67894680/2828287 [last accessed 08-06-2021];
This book is a WIP. Check online to get updates for free. 212
Usages of underscore
Introduction
In this Pydon’t we will take a look at all the use cases there are for _ in Python. There are a couple of places
where _ has a very special role syntactically, and we will talk about those places. We will also talk about the
uses of _ that are just conventions people follow, and that allow one to write more idiomatic code.
In this Pydon’t, you will:
• learn about the utility of _ in the Python REPL;
• learn what _ does when used as a prefix and/or suffix of a variable name:
213
– a single underscore used as a suffix;
– a single underscore used as a prefix;
– double underscore used as a prefix;
– double underscore used as a prefix and suffix;
• see the idiomatic usage of _ as a “sink” in assignments;
• and understand how that was extended to _’s role in the new match statement;
• see the idiomatic usage of _ in localising strings; and
• learn how to use _ to make your numbers more readable.
This book is a WIP. Check online to get updates for free. 214
Prefixes and suffixes for variable names
Single underscore as a suffix
As you know, some words have a special meaning in Python, and are therefore dubbed as keywords. This
means we cannot use those names for our variables. Similarly, Python defines a series of built-in functions
that are generally very useful and ideally we would like to avoid using variable names that match those built-in
names.
However, there are occasions in which the perfect variable name is either one of those keywords or one of
those built-in functions. In those cases, it is common to use a single _ as a suffix to prevent clashes.
For example, in statistics, there is a random distribution called the “exponential distribution” that depends
on a numeric parameter, and that parameter is typically called “lambda” in the mathematical literature. So,
when random decided to implement that distribution in random.expovariate, they would ideally like to use
the word lambda as the parameter to random.expovariate, but lambda is a reserved keyword and that would
throw an error:
>>> def expovariate(lambda):
File "<stdin>", line 1
def expovariate(lambda):
^
SyntaxError: invalid syntax
Instead, they could have named the parameter lambda_. (The implementers ended up going with lambd,
however.)
There are many examples in the Python Standard Library where the implementers opted for the trailing
underscore. For example, in the code for IDLE (the IDE that comes by default with Python and that is
implemented fully in Python) you can find this function:
## From Lib/idlelib/help.py in Python 3.9.2
def handle_starttag(self, tag, attrs):
"Handle starttags in help.html."
class_ = ''
for a, v in attrs:
if a == 'class':
class_ = v
# Truncated for brevity...
Notice the class_ variable that is defined and updated inside the loop. “class” would be the obvious variable
name here because we are dealing with HTML classes, but class is a reserved keyword that we use to define,
well, classes… And that’s why we use class_ here!
This book is a WIP. Check online to get updates for free. 215
Let me start by explaining the convention: when you define a name that starts with a single underscore, you
are letting other programmers know that such a name refers to something that is for internal use only, and
that outside users shouldn’t mess around with.
For example, suppose that you are implementing a framework for online shops, and you are now writing the
part of the code that will fetch the price of an item. You could write a little function like so:
prices = {
"jeans": 20,
"tshirt": 10,
"dress": 30,
}
def get_price(item):
return prices.get(item, None)
Now, shops nowadays can’t do business without having sales from time to time, so you add a parameter to
your function os that you can apply discounts:
def get_price(item, discount=0):
p = prices.get(item, None)
if p is not None:
return (1 - discount)*p
else:
return p
Now all is good, except you think it might be a good idea to validate the discount that the function is trying
to apply, so that discounts are never negative or greater than 100%. You could do that in the main function,
or you can devise a helper function to do that for you, probably because you will need to verify that discount
amounts are correct in a variety of places.
So, you write your helper function:
def valid_discount(discount):
return 0 <= discount <= 1
By the way, if you want to learn more about the fact that Python allows the chaining of comparisons, like
what you see above, you can read this Pydon’t on the subject.
Now you have a way to validate discounts and you can use that:
def get_price(item, discount=0):
if not valid_discount(discount):
raise ValueError(f"Trying to apply an illegal discount on {item}.")
p = prices.get(item, None)
if p is not None:
return (1 - discount)*p
else:
return p
Perfect! The codebase for your online shop management framework is well on its way.
This book is a WIP. Check online to get updates for free. 216
Now imagine, for a second, that you are a user of your framework, and not an implementer. You will prob-
ably install the framework from PyPI, with pip, or maybe directly from GitHub. But when you do, and when
you import the code to start using it, you will import the get_price and the valid_discount functions.
Now, you need the get_price function but you don’t need the valid_discount because the whole frame-
work already protects the user from illegal discounts and negative prices and whatnot! In other words, the
valid_discount function is more relevant to the internals of the framework than to users of the framework.
Except the user probably doesn’t know that, because the user sees the valid_discount function and it is
fair to assume that the user will think they have to use that function to validate discounts for themselves…
How could they know they don’t need to?
One solution would be for you to follow the convention we just started discussing! If you name your function
just a tad differently:
def _valid_discount(discount):
return 0 <= discount <= 1
The user of the framework immediately understands “oh, I don’t have to worry about this function because
its name starts with a single underscore”. Not only that, but Python even helps users not worry about those
functions with leading underscores.
Go ahead and write the following in your onlineshop.py file:
## onlineshop.py
def _valid_discount(discount):
return 0 <= discount <= 1
prices = {
"jeans": 20,
"tshirt": 10,
"dress": 30,
}
This book is a WIP. Check online to get updates for free. 217
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\rodri\Documents\mathspp\onlineshop.py", line 13, in get_price
raise ValueError(f"Trying to apply an illegal discount on {item}.")
ValueError: Trying to apply an illegal discount on jeans.
Notice how both functions appear to be working just fine, and notice that we got an error on the last call
because 1.3 is too big of a discount, so the _valid_discount function said it wasn’t valid.
Let us check it for ourselves:
>>> _valid_discount(1.3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name '_valid_discount' is not defined
We get a NameError because the _valid_discount function isn’t defined… Because it was never imported!
The function was not imported into your code, even though the original code can still use it internally. If you
really need to access _valid_discount, then you either import it explicitly, or you just import the module
name and then access it with its dotted name:
>>> from onlineshop import _valid_discount
>>> _valid_discount(0.5)
True
>>> import onlineshop
>>> onlineshop._valid_discount(1.3)
False
This mechanism also works with the variables, as long as their name starts with a leading underscore. Go
ahead and rename the prices variable to _prices, close the REPL, open it again, and run from onlineshop
import *. _prices will not be defined!
So, on the one hand, notice that a leading underscore really is an indication of what things you should and
shouldn’t be concerned with when using code written by others. On the other hand, the leading underscore
is just an indication, and it won’t prevent others from accessing the names that you write with a leading
underscore.
Finally, there is one other way of controlling what gets imported when someone uses the * to import
everything from your module: you can use the __all__ variable to specify the names that should be
imported on that occasion.
Go ahead and add the following line to the top of your onlineshop.py file:
__all__ = ("get_price", "_valid_discount")
After you do that, close your REPL and reopen it:
>>> from onlineshop import *
>>> get_price
<function get_price at 0x0000029410907430>
>>> _valid_discount
<function _valid_discount at 0x0000029410907280>
This book is a WIP. Check online to get updates for free. 218
>>> prices
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'prices' is not defined
Notice that all the names inside __all__ were imported, regardless of them starting with a single underscore
or not, and the names that were not listed did not get included. In my example, my variable was named prices
(so it didn’t even have a leading underscore!) and it was not imported.
This __all__ variable is the perfect segue into the next subsection:
This book is a WIP. Check online to get updates for free. 219
What does this mean?
First, let us see this in action. Modify the onlineshop.py file so that our code now belongs to a class called
OnlineShop:
## onlineshop.py
class OnlineShop:
__prices = {
"jeans": 20,
"tshirt": 10,
"dress": 30,
}
This book is a WIP. Check online to get updates for free. 220
Go ahead and look for the names of the things we defined. Can you find the _valid_discount and get_price
functions? What about __prices? You won’t be able to find __prices in that list, but the very first item of
the list is _OnlineShop__prices, which looks awfully related.
Remember when I said that a double leading underscore is used to avoid name collisions? Well, there’s
a high chance that people might want to create a variable named prices if they extend your online shop
framework, and you might still need your original prices variable, so you have two options:
• give a huge, very complicated, name to your prices variable, so that it becomes highly unlikely that
others will create a variable with the same name; or
• you use __prices to ask Python to mangle the variable name, to avoid future collisions.
Going with the second option meant that Python took the original variable name, which was __prices, and
prepended the class name to it, plus an additional leading underscore, so that users still know they should
leave that name alone. That is the explicit name you can use to reach that variable from outside the class:
>>> shop._OnlineShop__prices
{'jeans': 20, 'tshirt': 10, 'dress': 30}
This name mangling facility works for both variables and functions, so you could have a __valid_discount
method that would look like _OnlineShop__valid_discount from outside of the class, for example.
It is highly likely that you won’t have the need to use double leading underscores in your code, but I couldn’t
just ignore this use case!
Underscore as a sink
One of my favourite use cases for the underscore is when we use the underscore as the target for an assign-
ment. I am talking about the times we use _ as a variable name in an assignment.
It is a widely-spread convention that using _ as a variable name means “I don’t care about this value”. Having
said this, you should be asking yourself this: If I don’t care about a value, why would I assign it in the first
place? Excellent question!
Doing something like
_ = 3 # I don't care about this 3.
is silly. Using the underscore as a sink (that is, as the name of a variable that will hold a value that I do not
care about) is useful in other situations.
Unpacking
I have written at length about unpacking in other Pydon’ts:
• “Unpacking with starred assignments”
• “Deep unpacking”
Unpacking is a feature that lets you, well, unpack multiple values into multiple names at once. For example,
here is how you would split a list into its first and last items, as well as into the middle part:
This book is a WIP. Check online to get updates for free. 221
>>> first, *mid, last = range(0, 10)
>>> first
0
>>> mid
[1, 2, 3, 4, 5, 6, 7, 8]
>>> last
9
Isn’t this neat? Well, it is! But what if you only cared about the first and last items? There are various options,
naturally, but I argue that the most elegant one uses _ as a sink for the middle part:
>>> first, *_, last = range(0, 10)
>>> first
0
>>> last
9
Why is this better than the alternative below?
>>> sequence = range(0, 10)
>>> first, last = sequence[0], sequence[-1]
Obviously, sequence = range(0, 10) is just an example of a sequence. If I knew in advance this were the
sequence I’d be using, then I would assign first = 0 and last = 9 directly. But for generic sequences,
the two use cases behave differently.
Can you figure out when? I talk about that in this Pydon’t.
The behaviour is different when sequence has only one element. Because they behave differently, there might
be cases where you have to use one of the two alternatives, but when you are given the choice, the unpacking
looks more elegant and conveys the intent to split the sequence in its parts better.
Of course _ is a valid variable name and you can ask for its value:
>>> first, *_, last = range(0, 10)
>>> _
[1, 2, 3, 4, 5, 6, 7, 8]
But when I see the *_ in the assignment, I immediately understand the semantics of that assignment as
“ignore the middle part of the range”.
This can also be used when you are unpacking some structure, and only care about specific portions of the
structure. You could use indexing to access the specific information you want:
>>> colour_info = ("lightyellow", (255, 255, 224))
>>> blue_channel = colour_info[1][2]
>>> blue_channel
224
But if the colour_info variable is malformed, you will have a hard time figuring that out. Instead, using
unpacking, you can assert that the structure is correct and at the same time only access the value(s) that
matter:
This book is a WIP. Check online to get updates for free. 222
>>> colour_info = ("lightyellow", (255, 255, 224))
>>> _, (_, _, blue_channel) = colour_info
>>> blue_channel
224
This book is a WIP. Check online to get updates for free. 223
>>> v = 10
>>> match v:
... case _:
... print(_)
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
NameError: name '_' is not defined
If you want to match anything else and be able to refer to the original value, then you need to use a valid
target name:
>>> v = 10
>>> match v:
... case wtv:
... print(wtv)
...
10
String localisation
Another niche use case for the underscore, but that I find absolutely lovely, is for when you need to localise
your programs. Localising a program means making it suitable for different regions/countries. When you do
that, one of the things that you have to do is translate the strings in your program, so that they can be read
in many different languages.
How would you implement a mechanism to enable your program to output in (arbitrarily many) different
languages? Do think about that for a second, it is a nice challenge! Assume you can’t use modules built
specifically for localisation.
Whatever you do, for example a function call or accessing a dictionary, is going to happen in various places
and is going to generate too much noise. If your program has plenty of strings, going from
print("Hello, world!")
to
print(translate("Hello, world!"))
may look harmful, but in a program with many strings, all the translate calls will add a lot of visual clutter.
So, it is common practice to create an alias to a function like the translate function and call it _. Then,
localising a string doesn’t add much visual clutter:
print(_("Hello, World!"))
This is just a convention, but it is so common that it is even mentioned in the gettext docs, the document-
ation for a module designed specifically to help your programs handle multiple (natural) languages.
When I first found this usage of _ I was very confused. I found it when looking at the source code for the
argparse module. Because argparse deals with command-line interfaces, it makes sense that its inner-
This book is a WIP. Check online to get updates for free. 224
workings are localised, so that its command-line messages match the language of the command-line itself.
I still remember the very first time I saw it; I was looking at these two lines:
if prefix is None:
prefix = _('usage: ')
I was very confused with the _('usage: ') part of the assignment, but eventually I found the import state-
ment in that file:
from gettext import gettext as _, ngettext
And I realised they were setting _ as an alias for gettext.
This book is a WIP. Check online to get updates for free. 225
>>> lightyellow = 0xff_ff_e0
>>> peachpuff = 0xff_da_b9 # I didn't invent this name!
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Coding conventions exist to make our lives easier, so it is worth learning them to make our code
more expressive and idiomatic.”
This Pydon’t showed you that:
• you can recover the last value of an expression in the Python REPL with _;
• _ has quite an impact on names when used as a prefix/suffix:
– name_ is a common choice for when name is a reserved keyword;
– _name is a convention to signal that name is an internal name and that users probably shouldn’t
mess with it;
* _name won’t be imported if someone uses a from mymodule import * wildcard import; and
* this can be overriden if _name is added to the __all__ list in mymodule.
– dunder names (that start and end with double underscore) refer to Python’s internals and allow
you to interact with Python’s syntax;
– __name is used inside classes to prevent name collisions, when you want to use an internal variable
with a name that you are afraid users might override by mistake;
• _ is used in an idiomatic fashion as a sink in assignments, especially
– when unpacking several values, when only some are of interest;
– when iterating in a for loop where we don’t care about the iteration number;
• the new match statement uses _ as the “match all” case and makes it a true sink because _ can’t be
used to access the original value;
• _ is often used as an alias for localisation functions because of its low visual impact;
• numbers in different bases (decimal, binary, …) can have their digits split by underscores to improve
readability. For example, compare 99999999 with 999_999_999 with 999999999.
References
• Python 3 Documentation, The Python Tutorial, Modules, “Importing * From a Package”, https://docs.p
ython.org/3/tutorial/modules.html#importing-from-a-package [last accessed 14-06-2021];
• Python 3 Documentation, The Python Standard Library, gettext, https://docs.python.org/3/library/ge
ttext.html [last accessed 14-06-2021];
• Python 3 Documentation, The Python Standard Library, random.expovariate, https://docs.python.org/
3/library/random.html#random.expovariate [last accessed 14-06-2021];
• Weisstein, Eric W. “Exponential Distribution.” From MathWorld – A Wolfram Web Resource. https:
//mathworld.wolfram.com/ExponentialDistribution.html [last accessed 14-06-2021];
• Bader, Dan “The Meaning of Underscores in Python”, https://dbader.org/blog/meaning-of-underscores-
in-python [last accessed 14-06-2021];
• Datacamp, “Role of Underscore(_) in Python”, https://www.datacamp.com/community/tutorials/role-
underscore-python [last accessed 14-06-2021];
This book is a WIP. Check online to get updates for free. 226
• Hackernoon, “Understanding the Underscore( _ ) of Python”, https://hackernoon.com/understanding-
the-underscore-of-python-309d1a029edc [last accessed 14-06-2021];
This book is a WIP. Check online to get updates for free. 227
name dunder attribute
Introduction
In this Pydon’t we will take a look at the __name__ attribute. If you Google it, you will find a ton of results
explaining one use case of the __name__ attribute, so in this Pydon’t I’ll try to tell you about another couple
of use cases so that you learn to use __name__ effectively in your Python programs.
In this Pydon’t, you will:
228
• learn about the idiomatic usage of __name__ to create “main” functions in Python;
• learn about the read-only attribute __name__ that many built-in objects get;
• see how __name__ is used in a convention involving logging; and
• see some code examples of the things I will be teaching.
What is __name__?
__name__ is a special attribute in Python. It is special because it is a dunder attribute, which is just the
name that we give, in Python, to attributes whose names start and end with a double underscore. (I explain
in greater detail what a dunder attribute/method is in a previous Pydon’t.)
You can look __name__ up in the Python documentation, and you will find two main results that we will cover
here. One of the results talks about __main__ as a module attribute, while the other result talks about
__main__ as an attribute to built-in object types.
This book is a WIP. Check online to get updates for free. 229
Also, notice that the value printed matches the name of the file it came from. Here, we see that __name__
was automatically set to the name of the file it was in (print_name) when the code from print_name was
imported from importer.
So, we see that __name__ takes on different values depending on whether the code is ran directly as a script
or imported from elsewhere.
When you write code, you often write a couple of functions that help you solve your problem, and then you
apply those functions to the problem you have at hands.
For example, when I wrote some Python code to count valid passwords in an efficient manner, I wrote a class
to represent an automaton in a file called automaton.py:
## automaton.py
class Automaton:
# ...
That class was problem-agnostic, it just implemented some basic behaviour related to automatons. It just
so happened that that behaviour was helpful for me to solve the problem of counting passwords efficiently,
so I imported that Automaton class in another file and wrote a little program to solve my problem. Thus, we
can say that the majority of the times that I will use the code in my automaton.py file will be to import it
from elsewhere and to use it.
However, I also added a little demo of the functionality of the Automaton class in the automaton.py file.
Now, the problem is that I don’t want this little demo to run every time the Automaton class is imported by
another program, so I have to figure out a way to only run the demo if the automaton.py file is ran directly as
a script… The reason is that my demo code has some print statements that wouldn’t make sense to a user
that just did import automaton from within another script… Imagine importing a module into your program
and suddenly having a bunch of prints in your console!
Now, we can use __name__ to avoid that! We have seen that __name__ is set to "__main__" when a script is
ran directly, so we just have to check that:
## automaton.py
class Automaton:
# ...
if __name__ == "__main__":
print("Demo code.")
This is the most well-known use case of __name__. This is why you will commonly see snippets like
if __name__ == "__main__":
main()
It is just the Pythonic way of separating the functions and classes and other definitions, that might be useful
for you to import later on, from the code that you only want to run if your program is the main piece of code
being executed.
This book is a WIP. Check online to get updates for free. 230
By the way, this global variable __name__ really is a variable that just gets initialised without you having to
do anything. But you can assign to it, even though it is unlikely that you might need to do that. Hence, this
code is perfectly valid:
__name__ = "My name!"
if __name__ == "__main__":
# This will never run:
print("Inside the __main__ if.")
This book is a WIP. Check online to get updates for free. 231
>>> def get_type_name(obj):
... return type(obj).__name__
...
>>> get_type_name("hello")
'str'
>>> get_type_name(sum)
'builtin_function_or_method'
This is much shorter, much cleaner (doesn’t have nested function calls, for example), and much easier to
read, as the code says what it is doing. The name we picked for our function is good already, because it
is easy to make an educated guess about what the function does, but it is much better if the body of the
function itself makes it absolutely clear that we are getting what we want!
This ability of reaching out for the __name__ of things is useful, for example, when you want to print an error
message because you expected an argument of some type and, instead, you got something else. Using
__name__ you can get prettier error messages.
You can query the __name__ of things other than built-in types. You can also query the name of functions,
for example:
>>> sum.__name__
'sum'
>>> get_type_name.__name__
'get_type_name'
This might be relevant if you get ahold of a function in a programmatic way and need to figure out what
function it is:
>>> import random
>>> fn = random.choice([sum, get_type_name])
>>> fn.__name__
'sum'
I don’t think you are likely to receive a function from a random.choice call, but this just shows how you can
use __name__ to figure out what function you are looking at.
Another great thing that already comes with a __name__ is your custom classes. If you define a class,
__name__ will be a very clean way of accessing the pretty class name without having to jump through too
many hoops or doing hacky string processing:
>>> class A():
... pass
...
>>> A
<class '__main__.A'>
>>> A.__name__
'A'
>>> a = A()
>>> a
>>> type(a)
<class '__main__.A'>
This book is a WIP. Check online to get updates for free. 232
>>> type(a).__name__
'A'
Similarly to the module __name__, the __name__ attribute of types, functions, etc, can be assigned directly:
>>> type(a).__name__
'A'
>>> A.__name__ = "name..?"
>>> type(a).__name__
'name..?'
Sometimes this is useful, for example when you need to copy some metadata from one object to another.
Examples in code
I showed you what is the meaning that the __name__ attribute has, both as a module attribute and as an
attribute of type objects, and now I will show you how this knowledge can be put to practice. I will be
drawing my examples from the Python Standard Library, as per usual.
This book is a WIP. Check online to get updates for free. 233
Just out of curiosity, the Python Standard Library for my installation of Python 3.9.2 has 2280 .py files, and
if you look for it, you can find the line if __name__ == "__main__": in 469 files, a little over a fifth of the
files… So this really is a common pattern in Python!
This book is a WIP. Check online to get updates for free. 234
>>> Colour # <--
<enum 'Colour'> # <-- this is what `cls` is...
# By the way, this is *not* a string.
So we could get its __name__ directly and produce a pretty error message, or at least as pretty as error
messages go.
@classmethod
def from_decimal(cls, dec):
"""Converts a finite Decimal instance to a rational number, exactly."""
from decimal import Decimal
if isinstance(dec, numbers.Integral):
dec = Decimal(int(dec))
elif not isinstance(dec, Decimal):
raise TypeError(
"%s.from_decimal() only takes Decimals, not %r (%s)" %
(cls.__name__, dec, type(dec).__name__))
return cls(*dec.as_integer_ratio())
Notice how the function takes a dec and tries to convert it to a Decimal if the argument isn’t a Decimal but
is easy to treat as one. That is why giving 3 to the function doesn’t give an error:
>>> fractions.Fraction.from_decimal(3)
Fraction(3, 1)
However, "3" is not a numbers.Integral and it is also not a Decimal, so dec fails the tests and we end up
with
This book is a WIP. Check online to get updates for free. 235
raise TypeError(
"%s.from_decimal() only takes Decimals, not %r (%s)" %
(cls.__name__, dec, type(dec).__name__))
Notice how we even have two __name__ usages here. The first one is similar the example above with Enum,
and we take our cls (that is already a class) and simply ask for its name. That is the part of the code that
built the beginning of the message:
TypeError: Fraction.from_decimal() ...
^^^^^^^^
Then we print the value that actually got us into trouble, and that is what the dec is doing there:
TypeError: Fraction.from_decimal() only takes Decimals, not '3' ...
^^^
Finally, we want to tell the user what it is that the user passed in, just in case it isn’t clear from the beginning
of the error message. To do that, we figure out the type of dec and then ask for its __name__, hence the
type(dec).__name__ in the code above. This is what produces the end of the error message:
TypeError: Fraction.from_decimal() only takes Decimals, not '3' (str)
^^^
The "%s" and "%r" in the string above have to do with string formatting, a topic that is yet to be covered in
these Pydon’ts. Stay tuned to be the first to know when those Pydon’ts are released.
This type(obj).__name__ pattern is also very common. In my 3.9.2 installation of the Python Standard
Library, it appeared 138 times in 74 different .py files. The specific cls.__name__ pattern also showed up
a handful of times.
Logging convention
For the final code example I will be showing you a common convention that is practised when using the
logging module to log your programs.
The logging module provides a getLogger function to the users, and that getLogger function accepts a
name string argument. This is so that getLogger can return a logger with the specified name.
On the one hand, you want to name your loggers so that, inside huge applications, you can tell what logging
messages came from where. On the other hand, the getLogger function always returns the same logger if
you give it the same name, so that inside a single module or file, you don’t need to pass the logger around,
you can just call getLogger always with the same name.
Now, you want to get your logger by using always the same name and you also want the name to identify clearly
and unequivocally the module that the logging happened from. This shows that hand-picking something like
"logger" is a bad idea, as I am likely to pick the same logger name as other developers picked in their code,
and so our logging will become a huge mess if our code interacts.
The other obvious alternative is to name it something specific to the module we are in, like the file name.
However, if I set the logger name to the file name by hand, I know I will forget to update it if I end up
changing the file name, so I am in a bit of a pickle here…
Thankfully, this type of situation is a textbook example of when the __name__ attribute might come in handy!
This book is a WIP. Check online to get updates for free. 236
The __name__ attribute gives you a readable name that clearly identifies the module it is from, and using
__name__ even means that your logging facilities are likely to behave well if your code interacts with other
code that also does some logging.
This is why using getLogger(__name__) is the recommended convention in the documentation and that is
why this pattern is used approximately 84% of the times! (It is used in 103 .py files out of the 123 .py files
that call the getLogger function in the Python Standard Library.)
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“The __name__ attribute is a dynamic attribute that tells you the name of the module you are in,
or the name of the type of your variables.”
This Pydon’t showed you that:
• __name__ is a module attribute that tells you the name of the module you are in;
• __name__ can be used to tell if a program is being ran directly by checking if __name__ is __main__;
• you can and should use __name__ to access the pretty name of the types of your objects;
• __name__ is an attribute that can be assigned to without any problem;
• the if statement if __name__ == "__main__": is a very Pythonic way of making sure some code only
runs if the program is ran directly;
• the pattern type(obj).__name__ is a simple way of accessing the type name of an object; and
• there is a well-established convention that uses __name__ to set the name of loggers when using the
logging module.
References
• Python 3 Documentation, The Python Standard Library, Built-in Types, Special Attributes, ht-
tps://docs.python.org/3/library/stdtypes.html?highlight=name#definition.__name__ [last accessed
29-06-2021];
• Python 3 Documentation, The Python Standard Library, Top-level script environment, https://docs.python.org/3/library/ma
main [last accessed 29-06-2021];
• Python 3 Documentation, The Python Language Reference, The import system, https://docs.python.org/3/reference/impo
[last accessed 29-06-2021];
• Python 3 Documentation, Python HOWTOs, Logging HOWTO, Advanced Logging Tutorial, https://docs
.python.org/3/howto/logging.html#advanced-logging-tutorial [last accessed 29-06-2021];
• Python 3 Documentation, The Python Standard Library, calendar, https://docs.python.org/3/library/ca
lendar.html [last accessed 29-06-2021];
• Python 3 Documentation, The Python Standard Library, enum, https://docs.python.org/3/library/enum.h
tml [last accessed 29-06-2021];
• Python 3 Documentation, Search results for the query “__name__”, https://docs.python.org/3/search.html?q=name&chec
[last accessed 29-06-2021];
This book is a WIP. Check online to get updates for free. 237
Bite-sized refactoring
Introduction
Refactoring code is the act of going through your code and changing bits and pieces, generally with the
objective of making your code shorter, faster, or better any metric you set.
In this Pydon’t I share my thoughts on the importance of refactoring and I share some tips for when you
need to refactor your code, as I walk you through a refactoring example.
In this Pydon’t, you will:
• understand the importance of refactoring;
• walk through a real refactoring example with me; and
• learn tips to employ when refactoring your own code.
238
Refactoring
REFACTOR – verb
“restructure (the source code of an application or piece of software) so as to improve operation
without altering functionality.”
As you can see from the definition above, the act of refactoring your code is an attempt at making your code
better. Making your code better might mean different things, depending on your context:
• it might mean it is easier to maintain;
• it might mean it is easier to explain to beginners;
• it might mean it is faster;
• …
Regardless of the metric(s) you choose to improve, everyone can benefit from learning to refactor code.
Why is that?
When you are refactoring code you are training a series of skills that are helpful to you as a developer, like
your ability to read code and really comprehend it, pattern recognition skills, critical thinking, amongst others.
Critical thinking
When reading code you wish to refactor, you will invariably find pieces of code that look like they shouldn’t
be there.
This can have many meanings.
It might be a piece of code that is in the wrong file. A piece of code that is in the wrong function. Sometimes,
even, a piece of code that looks like it could/should be deleted. At these points in time, the only thing you
can do is use your brain to figure out what are the implications of moving things around. You shouldn’t be
This book is a WIP. Check online to get updates for free. 239
afraid to move things around, after you have considered what are the implications of leaving things as-is
versus changing them.
Remember, you should strive to write elegant code, and part of that entails writing code in a way that makes
it as easy as possible to refactor later on. Code is a mutable thing, so make sure to facilitate the life of your
future self by writing elegant code that is easy to read.
What to refactor?
I am sure that people with different life experiences will answer differently to this question, the only thing I
can do is share my point of view on the subject.
Refactor often…
… or at least create the conditions for that.
If you have the possibility to refactor a piece of code and you know there are things that can be improved
upon, go ahead and do it. As you mature as a developer and gain experience, you keep learning new things;
on top of that, the technologies you are using are probably also evolving over time. This means that code
naturally goes into a state where it could benefit from refactoring.
This is a never-ending cycle: you should write code that is elegant and easy to read; that means that, in the
future, refactoring the code is easier and faster; refactoring makes the code easier to read and even more
elegant; which makes it easier to refactor in the future; that will make it easier to read and more elegant; and
so on and so forth.
Code refactoring shouldn’t be a daunting task because there is much to gain from it, so make sure to write
your code in a way that will allow you, or someone else, to refactor it later.
Case study
Now I will go in-depth into a short Python function that was written by a beginner and shared to Reddit I will
walk you through the process that happened in my brain when I tried refactoring that piece of code, and I
This book is a WIP. Check online to get updates for free. 240
will share little tips as we go along.
First, let me tell you the task that the code is supposed to solve.
Write a function that changes the casing of its letters:
• letters in even positions should become uppercase; and
• letters in odd positions should become lowercase.
Go ahead and try solving this task.
Starting point
The piece of code that was shared on the Internet was the following:
def myfunc(a):
empty=[]
for i in range(len(a)):
if i%2==0:
empty.append(a[i].upper())
else:
empty.append(a[i].lower())
return "".join(empty)
return "".join(empty)
The only difference here was the spacing in empty = [] and in if i % 2 == 0:. Spacing around operators
is very important because it gives your code room to breathe. Making sure that your code has a consistent
style goes a great length in making it readable to yourself and to others, so do try and build the habit of
following a certain style.
PEP 8 proposes a Python style and many follow that style, so it might be a good idea to take your time to
review that style guide. After you figure out how that style works, remember that you don’t need to start doing
This book is a WIP. Check online to get updates for free. 241
everything at the same time. You can pick that style up gradually. Also, recall that critical thinking is very
important. Sometimes it is best to ignore the style guide completely.
Naming
Names are very important, and naming your functions and variables correctly is crucial. Names can make
or break a program. Good names aid the reader of the code, whereas bad names make you spend hours
analysing otherwise simple code.
Names should reflect the intent, or a very important property, of the thing they refer to. This is the opposite
of using very generic names, like myfunc for a function or num for a number, when that function has a specific
role or that number contains some specific information.
A notable exception is the usage of i in for loops, for example, although personally I tend to prefer the slightly
more verbose idx.
So, looking at the code we currently have, I can identify three names that could be improved upon. Can you
figure out what those are? Have a go at changing them to something better.
Now, your suggestion doesn’t have to match mine, but here is what I came up with:
def alternate_casing(text):
letters = []
for idx in range(len(text)):
if idx % 2 == 0:
letters.append(text[idx].upper())
else:
letters.append(text[idx].lower())
return "".join(letters)
Here are the changes that I made:
• myfunc -> alternate_casing;
• a -> text;
• empty -> letters; and
• i -> idx (because of my personal preference).
Now, in and of itself, empty = [] seems to be a pretty good name. However, right after we initialise empty
with the empty list, we start filling it in, and so the name doesn’t reflect a property of the object that holds
throughout the program or that is important. Instead, by naming it letters, we specify what will be stored
in there.
This book is a WIP. Check online to get updates for free. 242
In our function we need the indices and the data, because we need the index to determine the operation to
do, and then we need the data (the actual letter) to change its casing. Using enumerate, here is how that
loop would end up:
def alternate_casing(text):
letters = []
for idx, letter in enumerate(text):
if idx % 2 == 0:
letters.append(letter.upper())
else:
letters.append(letter.lower())
return "".join(letters)
Not only we were able to remove the explicit indexing, therefore cutting down on one operation, but we also
express our intent more clearly: when someone finds an enumerate, they should immediately understand
that to mean “in this loop I need both the indices and the data I’m traversing”.
This book is a WIP. Check online to get updates for free. 243
Now, if we work on factoring out that .append(), because that’s independent of the value of idx % 2, we
could get something like
def alternate_casing(text):
letters = []
for idx, letter in enumerate(text):
if idx % 2 == 0:
capitalised = letter.upper()
else:
capitalised = letter.lower()
letters.append(capitalised)
return "".join(letters)
You may feel strongly about the fact that I just added a line of code, making the code longer instead of
shorter, but sometimes better code takes up more space. However…
return "".join(letters)
This book is a WIP. Check online to get updates for free. 244
def alternate_casing(text):
letters = []
for idx, letter in enumerate(text):
letters.append(letter.lower() if idx % 2 else letter.upper())
return "".join(letters)
At this point, the function is getting so short that there’s no point in having an extra blank line separating the
return statement, so I decided to put everything together.
This book is a WIP. Check online to get updates for free. 245
letter.lower() if idx % 2 else letter.upper()
for idx, letter in enumerate(text)
]
return "".join(letters)
Auxiliary variables
Once again, auxiliary variables aren’t always needed. Whether you have the broken up list comprehension or
the one with the short names, you can just get rid of the auxiliary variable and call .join() on those letters
directly:
def alternate_casing(text):
return "".join([l.lower() if i % 2 else l.upper() for i, l in enumerate(text)])
or
def alternate_casing(text):
return "".join([
letter.lower() if idx % 2 else letter.upper()
for idx, letter in enumerate(text)
])
Final comparison
For your reference, here is the code we started with:
def myfunc(a):
empty=[]
This book is a WIP. Check online to get updates for free. 246
for i in range(len(a)):
if i%2==0:
empty.append(a[i].upper())
else:
empty.append(a[i].lower())
return "".join(empty)
and here are two possible end products:
def alternate_casing(text):
return "".join(l.lower() if i % 2 else l.upper() for i, l in enumerate(text))
and
def alternate_casing(text):
return "".join(
letter.lower() if idx % 2 else letter.upper()
for idx, letter in enumerate(text)
)
Notice how the end products look so different from the starting point, but notice that we did everything one
small change at a time. Take your time to understand the small steps separately, and then appreciate how
they all fit together in this refactor.
One of the main takeaways is really that refactoring doesn’t need to happen in one fell swoop. It is ok to do
incremental changes, and maybe even preferable: incremental changes are easier to manage and easier to
reason about.
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Elegant code is easier to refactor, and when you refactor your code, you should strive to make
it more elegant.”
This Pydon’t showed you that:
• the ability to refactor code is important;
• the ability to refactor code is something you train;
• code refactoring can (and maybe should!) happen in small steps;
• consistent style increases code readability;
• auto-formatters can help enforce a fixed style upon our code;
• naming is important and should reflect
– the purpose of an object; or
– an important characteristic that is invariant;
• enumerate is your best friend when traversing data and indices;
• repeated code under an if-else block can be factored out;
• conditional expressions excel at conditional assignments;
• if conditions can be simplified with Truthy and Falsy values;
This book is a WIP. Check online to get updates for free. 247
• list comprehensions are good alternatives to simple for loops with .append() operations; and
• list comprehensions can be turned into generator expressions.
References
• Reddit /r/Python post “I get zero output even though there’s nothing wrong with this code according
to pycharm. What can be the reason? I would appreciate any help.”, https://www.reddit.com/r/learnpy
thon/comments/o2ko8l/i_get_zero_output_even_though_theres_nothing [last accessed 12-07-2021];
• Hoekstra, Conor; “Beautiful Python Refactoring” talk at PyCon US 2020, https://www.youtube.com/wa
tch?v=W-lZttZhsUY;
• PEP 8 – Style Guide for Python Code, https://www.python.org/dev/peps/pep-0008/ [last accessed
12-07-2021];
This book is a WIP. Check online to get updates for free. 248
String translate and maketrans
methods
Introduction
The strings methods str.translate and str.maketrans might be some of the lesser known string methods
in Python.
Sadly, most online resources that cover this topic make a really poor job of explaining how the two methods
work, so hopefully this Pydon’t will serve you and let you know about two really cool string methods.
In this Pydon’t, you will:
249
• be introduced to the string method str.translate;
• learn the available formats for the method translate;
• see that all characters (even emojis!) have a corresponding integer value;
• review the behaviour of the built-in functions ord and char;
• learn about the complementary string method str.maketrans;
• see good use cases for both str.translate and str.maketrans.
str.translate
The str.translate method is much unknown, but not because it is difficult to understand. It’s just underap-
preciated, which means it doesn’t get used much, which means it gets less attention than it deserves, which
means people don’t learn it, which means it doesn’t get used much, … Do you see where this is going?
I won’t pretend like this method will completely revolutionise every single piece of Python code you will write
in your life, but it is a nice tool to have in your tool belt.
As per the documentation, the str.translate(table) method returns
“a copy of the string in which each character has been mapped through the given translation
table.”
The translation table being mentioned here is the only argument that the method str.translate accepts.
In its simplest form, the method str.translate is similar to the method str.replace.
In case you don’t know it, here is what str.replace looks like:
>>> s = "Hello, world!"
>>> s.replace("l", "L")
'HeLLo, worLd!'
This book is a WIP. Check online to get updates for free. 250
>>> ord("A")
65
>>> ord("a")
97
>>> ord(" ")
32
>>> chr(65)
'A'
>>> chr(97)
'a'
>>> chr(32)
' '
>>> chr(128013)
'�'
Notice that even emoji have an integer that represents them!
chr takes an integer and returns the character that that integer represents, whereas ord takes a character
and returns the integer corresponding to its Unicode code point.
The “code point” of a character is the integer that corresponds to it in the standard being used – which is
the Unicode standard in the case of Python.
Translation dictionaries
Now that we know about the code points of characters, we can learn how to use the method str.translate,
because now we can build dictionaries that can be passed in as translation tables.
The translation dict that is fed as the argument to str.translate specifies the substitutions that are going
to take place in the target string.
The dictionary needs to map Unicode code points (i.e., characters) to other Unicode code points, to other
strings, or to None.
Let’s see if you can infer how each case works:
>>> ord("a"), ord("b"), ord("c")
(97, 98, 99)
>>> ord("A")
65
>>> "aaa bbb ccc".translate(
... {97: 65, 98: "BBB", 99: None}
... )
'AAA BBBBBBBBB '
Notice that the method str.translate above received a dictionary with 3 keys:
• 97 (the code point for "a") mapped to 65 (the code point for "A");
• 98 (the code point for "b") mapped to "BBB"; and
• 99 (the code point for "c") mapped to None.
This book is a WIP. Check online to get updates for free. 251
In the final result, we see that all lower case “A”s were replaced with upper case “A”s, the lower case “B”s
were replaced with triple “BBB” (so much so that we started with three “B”s and the final string has nine
“B”s), and the lower case “C”s were removed.
This is subtle, but notice that the empty spaces were left intact. What happens if the string contains other
characters?
>>> "Hey, aaa bbb ccc, how are you?".translate(
... {97: 65, 98: "BBB", 99: None}
... )
'Hey, AAA BBBBBBBBB , how Are you?'
We can see that the characters that were not keys of the dictionary were left as-is.
Hence, the translation works as follows:
• characters that do not show up in the translation table are left untouched;
• all other characters are replaced with their values in the mapping; and
• characters that are mapped to None are removed.
Non-equivalence to str.replace
Some of you might be thinking that I’m just being silly, making a huge fuss about str.translate, when all
I need is a simple for loop and the method str.replace. Are you right?
Let me rewrite the example above with a for loop and the string method str.replace:
>>> s = "Hey, aaa bbb ccc, how are you?"
>>> from_ = "abc"
>>> to_ = ["A", "BBB", ""]
>>> for f, t in zip(from_, to_):
... s = s.replace(f, t)
...
>>> s
'Hey, AAA BBBBBBBBB , how Are you?'
As we can see, the result seems to be exactly the same, and we didn’t have to introduce a new string method.
If you are not comfortable with the zip in that for loop above, I got you: take a look at the Pydon’t about
zip.
Of course, we are forgetting the fact that the for loop technique using successive str.replace calls is doing
more work than the str.translate method. What do I mean by this?
For every loop iteration, the str.replace method has to go over the whole string looking for the character
we want to replace, and that’s because consecutive str.replace calls are independent of one another.
But wait, if the successive calls are independent from one another, does that mean that..? Yes!
What if we wanted to take a string of zeroes and ones and replace all zeroes with ones, and vice-versa? Here
is the solution using the successive str.replace calls:
This book is a WIP. Check online to get updates for free. 252
>>> s = "001011010101001"
>>> from_ = "01"
>>> to_ = "10"
>>> for f, t in zip(from_, to_):
... s = s.replace(f, t)
...
>>> s
'000000000000000'
It didn’t work! Why not? After the first iteration is done, all zeroes have been turned into ones, and s looks
like this:
>>> s = "001011010101001"
>>> s.replace("0", "1")
'111111111111111'
The second iteration of the for loop has no way to know what ones are original and which ones used to be
zeroes that were just converted, so the call s.replace("1", "0") just replaces everything with zeroes.
In order to achieve the correct effect, we need str.translate:
>>> "001011010101001".translate(
... {ord("0"): "1", ord("1"): "0"}
... )
'110100101010110'
Therefore, we have shown that str.translate is not equivalent to making a series of successive calls to
str.replace, because str.replace might jumble the successive transformations.
This book is a WIP. Check online to get updates for free. 253
...
>>> translation_table[60:70]
[60, 61, 62, 63, 64, 'aa', 'bb', 'cc', 'dd', 'ee']
Now, we just need to call the method str.translate:
>>> "Hey, what's UP?".translate(translation_table)
"hhey, what's uupp?"
Here is all of the code from this little example, also making use of the string module, so that I don’t have
to type all of the alphabet again:
>>> from string import ascii_uppercase
>>> ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
str.maketrans
Having seen the generic form of translation tables, it is time to introduce str.translate’s best friend,
str.maketrans.
The method str.maketrans is a utility method that provides for a convenient way of creating translation
tables that can be used with str.translate.
str.maketrans accepts up to 3 arguments, so let’s break them down for you.
Single argument
The version of str.maketrans that only accepts one argument has the purpose of making it simpler for us,
users, to define dictionaries that can be used with str.translate.
Why would that be useful?
As we have seen above, when using dictionaries as translation tables we need to make sure that the keys of
the dictionary are the code points of the characters we want to replace.
This generally introduces some boilerplate, because in the most common cases we know the characters we
want to replace, not their code points, so we need to do the conversion by hand previously, or when defining
the dictionary with ord.
This is ugly, just take a look at the example we used before:
This book is a WIP. Check online to get updates for free. 254
>>> "001011010101001".translate(
... {ord("0"): "1", ord("1"): "0"}
... )
'110100101010110'
It would be lovely if we could just write the dictionary in its most natural form:
trans_table = {"0": "1", "1": "0"}
For this to work, we need to use str.maketrans:
>>> "001011010101001".translate(
... str.maketrans({"0": "1", "1": "0"})
... )
'110100101010110'
Two arguments
If you look at the example I just showed, we see that we did a very specific type of translation: we replaced
some characters with some other single characters.
This is so common, that the method str.maketrans can be used to create translation tables of this sort. For
that, the first argument to str.maketrans should be a string consisting of the characters to be replaced, and
the second argument is the string with the corresponding new characters.
Redoing the example above:
>>> "001011010101001".translate(
... str.maketrans("01", "10")
... )
'110100101010110'
Here is another example where the two strings have different characters, just for the sake of diversity:
>>> "#0F45cd".translate(
... str.maketrans("abcdef", "ABCDEF")
... )
'#0F45CD'
In this example we took a hexadecimal value representing a colour and made sure all the letters were upper
case.
(Of course we could have, and maybe should have, done that with the method str.upper.)
This book is a WIP. Check online to get updates for free. 255
>>> "# 0F45cd".translate(
... str.maketrans("abcdef", "ABCDEF", "# ")
... )
'0F45CD'
Examples in code
Now that you have been introduce to the string methods str.translate and str.maketrans, I will share a
couple of interesting use cases for these methods.
I will start with a personal use case, and then include three use cases from the Python Standard Library.
These code examples should help you understand how the two methods are used in the real world.
Caesar cipher
I wrote on Twitter, asking people for their most Pythonic implementation of the Caesar cipher.
I defined the Caesar cipher as a function that takes two arguments. The first, a string, specifies some text.
The second, an integer, specifies an integer key. Then, the upper case letters of the argument string should
be shifted, along the alphabet, by the amount specified by the key. All other characters should be left as-is:
>>> caesar("ABC", 1)
'BCD'
>>> caesar("ABC", 13)
'NOP'
>>> caesar("ABC", 25)
'ZAB'
>>> caesar("HELLO, WORLD", 7)
'OLSSV, DVYSK'
Some time later, I went to Twitter again to comment on some straightforward solutions and to also share
the most elegant solution ever.
Can you guess what my Caesar implementation leverages? If you said/thought str.translate and
str.maketrans, you are absolutely right!
Here is the nicest implementation of the Caesar cipher you will ever see:
def caesar(msg, key):
return msg.translate(
str.maketrans(ABC, ABC[key:] + ABC[:key])
)
In the code above, ABC is a global constant that contains the alphabet that is subject to change. If we set
ABC = string.ascii_uppercase, then we match exactly the Caesar cipher that I defined in the beginning:
>>> from string import ascii_uppercase
>>> ABC = ascii_uppercase
>>> def caesar(msg, key):
... return msg.translate(
This book is a WIP. Check online to get updates for free. 256
... str.maketrans(ABC, ABC[key:] + ABC[:key])
... )
...
>>> caesar("HELLO, WORLD", 7)
'OLSSV, DVYSK'
class ZipFile:
# ...
@classmethod
def _sanitize_windows_name(cls, arcname, pathsep):
"""Replace bad characters and remove trailing dots from parts."""
table = cls._windows_illegal_name_trans_table
if not table:
illegal = ':<>|"?*'
table = str.maketrans(illegal, '_' * len(illegal))
cls._windows_illegal_name_trans_table = table
arcname = arcname.translate(table)
# ...
The arcname is the name of the archive. The first thing we do is fetch the table and see if it has been set.
If it has not been set, then we set it for ourselves!
We define a series of illegal characters, and then use str.maketrans to create a translation table that trans-
lates them to underscores _:
>>> illegal = ':<>|"?*'
>>> table = str.maketrans(illegal, '_' * len(illegal))
>>> table
{58: 95, 60: 95, 62: 95, 124: 95, 34: 95, 63: 95, 42: 95}
Then, we save this computed table for later and proceed to translating the name of the archive, arcname.
This shows a straightforward usage of both str.maketrans and str.translate.
This book is a WIP. Check online to get updates for free. 257
Whitespace munging
(I didn’t know, so I Googled it: “to munge” means to manipulate data.)
Along the same spirit, Python’s textwrap module (used to wrap text along multiple lines and to do other
related string manipulations) uses str.translate to munge whitespace in the given text.
As a preprocessing step to wrapping a string, we replace all sorts of funky whitespace characters with a
simple blank space.
Here is how this is done:
## In Lib/textwrap.py from Python 3.9.2
class TextWrapper:
# ...
unicode_whitespace_trans = {}
uspace = ord(' ')
for x in _whitespace:
unicode_whitespace_trans[ord(x)] = uspace
# ...
Default replacement
If we peek at the source code for IDLE, the IDE that ships with Python, we can also find a usage of the
method str.translate, and this one in particular defines a custom object for the translation table.
Before showing you the code, let me tell you what it should do: we want to create a translation table that
• preserves the whitespace characters " \t\n\r";
• maps “(”, “[”, and “{” to “(”;
• maps “)”, “]”, and “}” to “)”; and
This book is a WIP. Check online to get updates for free. 258
• maps everything else to “x”.
The point here is that we need to parse some Python code and we are only interested in the structure of the
lines, while not so much in the actual code that is written.
By replacing code elements with “x”, those “x”s can then be deduplicated. When the “x”s are deduplicated
the string becomes (much!) smaller and the processing that follows becomes significantly faster. At least
that’s what the comments around the code say!
To help in this endeavour, we will implement a class called ParseMap that will be very similar to a vanilla
dict, with one exception: when we try to access a ParseMap with a key it doesn’t know, instead of raising a
KeyError, we return 120. Why 120? Because:
>>> ord("x")
120
Assuming ParseMap is already defined, here is what using it could look like:
>>> pm = ParseMap()
>>> pm
{}
>>> pm[0] = 343
>>> pm["hey"] = (1, 4)
>>> pm
{0: 343, 'hey': (1, 4)}
>>> pm[999]
120
By implementing this behaviour of returning 120 by default, we know that our translation table will map any
character to “x” by default.
Now that the idea was introduced, here is the code:
## In Lib/idlelib/pyparse.py from Python 3.9.2
class ParseMap(dict):
# [comments omitted for brevity]
This book is a WIP. Check online to get updates for free. 259
>>> dict.fromkeys("abc", 42)
{'a': 42, 'b': 42, 'c': 42}
>>> dict.fromkeys(range(3), "Hello, world!")
{0: 'Hello, world!', 1: 'Hello, world!', 2: 'Hello, world!'}
The line
trans = ParseMap.fromkeys(range(128), 120)
is there to explicitly map many common characters to “x”, which is supposed to speed up the translation
process itself.
Then, the three lines that follow update the translation table in such a way that the parenthesis, brackets, and
braces, are mapped like I said they would.
In the end, the translation behaves like this:
>>> s = "(This [is]\tsome\ntext.)"
>>> print(s)
(This [is] some
text.)
>>> print(s.translate(trans))
(xxxxx(xx)xxxxx
xxxxx)
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“When you need to replace several characters with other characters or strings, the method
str.translate is your best friend.”
This Pydon’t showed you that:
• the str.translate method replaces characters from an origin string with new characters or substrings;
• the character translation is controlled by a translation table that can be any object that supports index-
ing by integers;
• all characters (even emojis!) can be converted to a unique integer, and back, through the use of the
built-in functions ord and chr;
• the “code point” of a character is the integer that represents it;
• Python uses the code points set by the Unicode standard, the most widely-used in the world;
• the translation tables make use of the code points of characters to decide what is replaced by what;
• in general, str.translate cannot be replaced with a series of calls to str.replace;
• Python provides a utility method (str.maketrans) to help us create translation tables:
– with a single argument, it can process dictionaries to have the correct format;
– with two arguments, it builds a translation table that maps single characters to single characters;
and
– the third argument indicates characters that should be removed from the string. And
• the __missing__ dunder method controls how custom dict subclasses work when indexed with missing
keys;
This book is a WIP. Check online to get updates for free. 260
If you liked this Pydon’t be sure to leave a reaction below and share this with your friends and fellow Pythonis-
tas. Also, don’t forget to subscribe to the newsletter so you don’t miss a single Pydon’t!
References
• Python 3 Documentation, The Python Standard Library, Built-in Types, str.maketrans, https://docs.p
ython.org/3/library/stdtypes.html#str.maketrans [last accessed 16-08-2021];
• Python 3 Documentation, The Python Standard Library, Built-in Types, str.translate, https://docs.p
ython.org/3/library/stdtypes.html#str.translate [last accessed 16-08-2021];
• Python 3 Documentation, The Python Standard Library, zipfile, https://docs.python.org/3/library/zipf
ile.html [last accessed 17-08-2021];
• Python 3 Documentation, The Python Standard Library, textwrap, https://docs.python.org/3/library/te
xtwrap.html [last accessed 17-08-2021];
• Python 3 Documentation, The Python Standard Library, IDLE, https://docs.python.org/3/library/idle.h
tml [last accessed 17-08-2021];
• Unicode, https://home.unicode.org [last accessed 17-08-2021];
This book is a WIP. Check online to get updates for free. 261
Boost your productivity with the REPL
Introduction
The REPL is an amazing tool that every Python programmer should really know and appreciate! Not only that,
but you stand to gain a lot if you get used to using it and if you learn to make the most out of it �
In this Pydon’t, you will:
• learn what “REPL” stands for;
• understand how important the REPL is for your learning;
• understand the mechanism that “prints” results in the REPL;
• see how to recover the previous result in the REPL, in case you forgot to assign it;
• learn about the built-in help system;
• learn some tips for when you’re quickly hacking something together in the REPL;
262
• be told about two amazing tools to complement your usage of the REPL.
REPL
Read. Evaluate. Print. Loop.
That’s what “REPL” stands for, and it is often referred to as “read-eval-print-loop”. The REPL is the program
that takes your input code (i.e., reads your code), evaluates it, prints the result, and then repeats (i.e., loops).
The REPL, sometimes also referred to as the “interactive session”, or the “interpreter session”, is what you
get when you open your computer’s command line and type python or python3.
That should result in something like the following being printed:
Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
Of course, the exact things that are printed (especially the first line) are likely to differ from what I show here,
but it’s still the REPL.
(By the way, if you ever need to leave the REPL, just call the exit() function.)
REPL mechanics
Basic input and output
The REPL generally contains a >>> in the beginning of the line, to the left of your cursor. You can type code
in front of that prompt and press Enter. When you press Enter, the code is evaluated and you are presented
with the result:
>>> 3 + 3
6
This book is a WIP. Check online to get updates for free. 263
Multiline input
The REPL also accepts code that spans multiple lines, like if statements, for loops, function definitions with
def, etc.
In order to do those, just start typing your Python code regularly:
>>> if True:
When you press Enter after the colon, Python realises the body of the if statement is missing, and thus starts
a new line containing a ... on the left. The ... tells you that this is the continuation of what you started
above.
In order to tell Python you are done with the multiline code blocks is by pressing Enter on an empty line with
the continuation prompt ...:
>>> if True:
... print("Hello, world!")
...
Hello, world!
>>>
return 2 * x
Copying the code above and pasting it into the session, you will end up with a session log like this:
>>> def double(x):
...
File "<stdin>", line 2
^
IndentationError: expected an indented block
>>> return 2 * x
File "<stdin>", line 1
This book is a WIP. Check online to get updates for free. 264
return 2 * x
IndentationError: unexpected indent
This happens because the REPL finds a blank line and thinks we tried to conclude the definition of the
function.
No printing, or None
In particular, if the expression you wrote evaluates to None, then nothing gets printed.
The easiest way to see this is if you just type None in the REPL. Nothing gets displayed; contrast that with
what happens if you just type 3:
>>> None
>>> 3
3
If you call a function that doesn’t have an explicit return value, or that returns None explicitly, then those
functions will not show anything in the REPL:
This book is a WIP. Check online to get updates for free. 265
>>> def explicit_None_return():
... # Return None explicitly.
... return None
...
>>> explicit_None_return() # <- nothing gets displayed.
Repeated imports
Sometimes it is useful to use the REPL to quickly import a function you just defined. Then you test the
function out and then proceed to changing it in the source file. Then you’ll want to import the function again
and test it again, except that won’t work.
You need to understand how the REPL handles imports, because you can’t import repeatedly to “update”
what’s in the session.
To show you this, go ahead and create a file hello.py:
## In `hello.py`:
print("Being imported.")
Just that.
Now open the REPL:
>>> import hello
Being imported!
Now try modifying the string inside the print, and re-import the module:
>>> import hello
Being imported!
## Modify the file, then import again:
>>> import hello
>>>
Nothing happens! That’s because Python already went through your file and knows what’s in there, so it
doesn’t need to parse and run the file again. It can just give you the functions/variables you need.
In short, if you modify variables, functions, code; and you need those changes to be reflected in the REPL,
then you need to leave the REPL with exit(), start it again, and import things again.
That’s why some of the tips for quick hacks I’ll share below are so helpful.
Edit: Another alternative – brought to my attention by a kind reader – is to use importlib.reload(module)
in Python 3.4+. In our example, you could use importlib.reload(hello):
This book is a WIP. Check online to get updates for free. 266
>>> import hello
Being imported
>>> import importlib # Use `imp` from Python 3.0 to Python 3.3
>>> importlib.reload(hello)
Being imported
<module 'hello' from 'C:\\tmp\\hello.py'>
We get that final line because importlib.reload returns the module it reloaded.
You can take a look at this StackOverflow question and answers to learn a bit more about this approach.
Be mindful that it may not work as you expect when you have multiple imports. Exiting the REPL and opening
it again may be the cleanest way to reload your imports in those situations.
REPL history
I’ll be honest with you, I’m not entirely sure if what I’m about to describe is a feature of the Python REPL or
of all the command lines I have worked with in my entire life, but here it goes:
You can use the up and down arrow keys to go over the history of expressions you already entered. That’s
pretty standard.
What’s super cool is that the REPL remembers this history of expressions, even if you exit the REPL, as long
as you don’t close the terminal.
This book is a WIP. Check online to get updates for free. 267
>>> _
0 # <- it still evaluates to 0!
If you want to get back the magical behaviour of _ holding the result of the last expression, just delete _ with
del _.
sum(iterable, /, start=0)
Return the sum of a 'start' value (default: 0) plus an iterable of numbers
>>>
What is great about this help built-in is that it can even provide help about your own code, provided you
document it well enough.
Here is the result of calling help on a function defined by you:
>>> def my_function(a, b=3, c=4):
... return a + b + c
...
>>> help(my_function)
Help on function my_function in module __main__:
>>>
You can see that help tells you the module where your function was defined and it also provides you with the
signature of the function, default values and all!
To get more information from help you need to document your function with a docstring:
>>> def my_function(a, b=3, c=4):
... """Return the sum of the three arguments."""
... return a + b + c
...
>>> help(my_function)
This book is a WIP. Check online to get updates for free. 268
Help on function my_function in module __main__:
>>>
Now you can see that the help function also gives you the information stored in the docstring.
I’ll be writing a Pydon’t about docstrings soon. Be sure to subscribe to my newsletter so you don’t miss it!
Semicolons
Yes, really.
Python supports semicolons to separate statements:
>>> a = 3; b = a + 56; print(a * b)
177
However, this feature is something that often does not belong in your code, so refrain from using it.
Despite being generally inadequate for production code, the semicolons are your best friends when in the
REPL. I’ll explain it to you, and you’ll agree.
In the command line you can usually use the up and down arrows to cycle through the most recently typed
commands. You can do that in the REPL as well. Just try evaluating a random expression, then press the up
arrow and Enter again. That should run the exact same expression again.
Sometimes you will be working in the REPL testing out a solution or algorithm incrementally. However, if you
make a mistake, you must reset everything.
At this point, you just press the arrows up and down, furiously trying to figure out all the code you have ran
already, trying to remember which were the correct expressions and which ones were wrong…
Semicolons can prevent that! You can use semicolons to keep track of your whole “progress” as you go:
whenever you figure out the next step, you can use the arrows to go up to the point where you last “saved
your progress” and then you can add the correct step at the end of your sequence of statements.
Here is an example of an interactive REPL session of me trying to order a list of names according to a list of
ages.
Instead of two separate assignments, I put them on the same line with ;:
This book is a WIP. Check online to get updates for free. 269
>>> names = ["John", "Anna", "Bill"]; ages = [20, 40, 30]
I could have written
>>> names, ages = ["John", "Anna", "Bill"], [20, 40, 30]
but using the semicolon expresses the intent of having the two assignments in separate lines when it comes
time to write the real code down.
Then, I will try to see how to put the ages and names together in pairs:
>>> [(age, name) for name, age in zip(names, ages)]
[(20, 'John'), (40, 'Anna'), (30, 'Bill')]
However, at this point I realise I’m being redundant and I can just use zip if I reverse the order of the
arguments:
>>> list(zip(ages, names))
[(20, 'John'), (40, 'Anna'), (30, 'Bill')]
Now that I’m happy with how I’ve paired names and ages together, I use the arrow keys to go back to the
line with the assignment. Then, I use a semicolon to add the new piece of code I worked out:
>>> names = ["John", "Anna", "Bill"]; ages = [20, 40, 30]; info_pairs = zip(ages, names)
zip is an amazing tool in Python and is one of my favourite built-in functions. You can learn how to wield
its power with this Pydon’t.
Now I can move on to the next step, knowing that a mistake now won’t be costly: I can reset everything by
going up to the line with all the intermediate steps and run that single line.
This book is a WIP. Check online to get updates for free. 270
While this is style that is not recommended for production code, it makes it more convenient to go up and
down the REPL history.
If you really want to push the boundaries, you can even combine this with semicolons:
>>> i = 1
>>> while i < 30: print(i); i *= 2
...
1
2
4
8
16
Other tools
I try to stick to vanilla Python as much as possible when writing these Pydon’ts, for one simple reason: the
world of vanilla Python is huge and, for most developers, has lots of untapped potential.
However, I believe I would be doing you a disservice if I didn’t mention two tools that can really improve
your experience in/with the REPL.
Rich
“Rich is a Python library for rich text and beautiful formatting in the terminal.”
Rich is an open source library that I absolutely love. You can read the documentation and the examples to
get up to speed with Rich’s capabilities, but I want to focus on a very specific one, in particular:
>>> from rich import pretty
>>> pretty.install()
Running this in your REPL will change your life. With these two lines, Rich will pretty-print your variables and
even include highlighting.
IPython
IPython is a command shell for interactive computing in multiple programming languages, originally de-
veloped for the Python programming language. IPython offers introspection, rich media, shell syntax, tab
This book is a WIP. Check online to get updates for free. 271
completion, and history, among other features.
In short, it is a Python REPL with more bells and whistles.
It is beyond the scope of this Pydon’t to tell you all about IPython, but it is something I had to mention (even
though I personally don’t use it).
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Get comfortable with using the REPL because that will make you a more efficient Python pro-
grammer.”
This Pydon’t showed you that:
• the REPL is a great tool to help prototype small ideas and solutions;
• the REPL supports multiline input, and breaks it after an empty line;
• the REPL implicitly shows the result of the expressions you type, with the caveat that what is shown is
an objects representation (repr), not its string value (str);
• you can use the arrows to navigate the history of the code you typed in the REPL;
• history of typed code is preserved after you exit the REPL, as long as you don’t close the terminal
window;
• None results don’t get displayed implicitly;
• repeatedly importing the same module(s) does not update their contents;
• you can access the result of the previous line using _;
• the help built-in can give you basic documentation about the functions, and other objects, you have
“lying around”; it even works on user-defined objects;
• by using docstrings, you improve the utility of the built-in help when used on custom objects;
• although not recommended best practices, the usage of semicolons and in-line multiline statements
can save you time when navigating the history of the REPL;
• Rich is a tool that you can use in your REPL to automatically pretty-print results with highlighting;
• IPython is an alternative Python REPL that comes with even more bells and whistles.
If you liked this Pydon’t be sure to leave a reaction below and share this with your friends and fellow Pythonis-
tas. Also, subscribe to the newsletter so you don’t miss a single Pydon’t!
References
• Rich, https://github.com/willmcgugan/rich [last accessed 25-08-2021
• IPython, https://ipython.org/ [last accessed 25-08-2021]
• Feedback with suggestions for improvements, Reddit comment, https://www.reddit.com/r/Python/co
mments/pbkq3z/boost_your_productivity_with_the_repl_pydont/hadom13/ [last accessed 26-08-2021];
• Stack Overflow question, “How to re import an updated package while in Python Interpreter?”, https:
//stackoverflow.com/q/684171/2828287 [last accessed 26-08-2021];
This book is a WIP. Check online to get updates for free. 272
set and frozenset
Introduction
Python contains a handful of built-in types, among which you can find integers, lists, strings, etc…
Python also provides two built-in types to handle sets, the set and the frozenset.
In this Pydon’t, you will:
• understand the relationship between the set built-in and the mathematical concept of “set”;
• learn what the set and frozenset built-ins are;
• see what the differences between set and frozenset are;
273
• learn how to create sets and frozen sets;
• understand how sets fit in with the other built-in types, namely lists;
• establish a parallel between lists and tuples, and sets and frozen sets;
• see good example usages of set (and frozenset) in Python code;
(Mathematical) sets
A set is simply a collection of unique items where order doesn’t matter. Whenever I have to think of sets, I
think of shopping carts.
No ordering
If you go shopping, and you take a shopping cart with you, the order in which you put the items in the
shopping cart doesn’t matter. The only thing that actually matters is the items that are in the shopping cart.
If you buy milk, chocolate, and cheese, it doesn’t matter the order in which those items are registered. What
matters is that you bought milk, chocolate, and cheese.
In that sense, you could say that the groceries you bought form a set: the set containing milk, chocolate,
and cheese. Both in maths and in Python, we use {} to denote a set, so here’s how you would define the
groceries set in Python:
>>> groceries = {"milk", "cheese", "chocolate"}
>>> groceries
{'cheese', 'milk', 'chocolate'}
>>> type(groceries).__name__
'set'
We can check that we created a set indeed by checking the __name__ of the type of groceries.
If you don’t understand why we typed type(groceries).__name__ instead of just doing type(groceries),
then I advise you to skim through the Pydon’t about the dunder attribute __name__. (P.S. doing
isinstance(groceries, set)) would also work here!)
To make sure that order really doesn’t matter in sets, we can try comparing this set with other sets containing
the same elements, but written in a different order:
>>> groceries = {"milk", "cheese", "chocolate"}
>>> groceries == {"cheese", "milk", "chocolate"}
True
>>> groceries == {"chocolate", "milk", "cheese"}
True
Uniqueness
Another key property of (mathematical) sets is that there are no duplicate elements. It’s more or less as if
someone told you to go buy cheese, and when you get back home, that person screams from another room:
“Did you buy cheese?”
This book is a WIP. Check online to get updates for free. 274
This is a yes/no question: you either bought cheese or you didn’t.
For sets, the same thing happens: the element is either in the set or it isn’t. We don’t care about element
count. We don’t even consider it.
Here’s proof that Python does the same:
>>> {"milk", "cheese", "milk", "chocolate", "milk"}
{'cheese', 'milk', 'chocolate'}
Creation
There are three main ways to create a set.
Explicit {} notation
Using the {} notation, you write out the elements of the set inside braces in a comma-separated list:
>>> {1, 2, 3}
{1, 2, 3}
>>> {"cheese", "ham"}
{'cheese', 'ham'}
>>> {"a", "b", "c"}
{'c', 'a', 'b'}
By the way, you cannot use {} to create an empty set! {} by itself will create an empty dictionary. To create
empty sets, you need the next method.
## ↑ different ↓
This book is a WIP. Check online to get updates for free. 275
>>> set("mississippi")
{'s', 'i', 'p', 'm'}
Calling set() by itself will produce an empty set.
Set comprehensions
Using {}, one can also write what’s called a set comprehension. Set comprehensions are very similar to list
comprehensions, so learning about list comprehensions will be helpful here.
I’ll just show a couple of brief examples.
First, one using filtering some of the elements we want to include:
>>> veggies = ["broccoli", "carrot", "tomato", "pepper", "lettuce"]
>>> {veggie for veggie in veggies if "c" in veggie}
{'lettuce', 'carrot', 'broccoli'}
And secondly, a set comprehension with two nested for loops:
>>> veggies = ["broccoli", "carrot", "tomato", "pepper", "lettuce"]
>>> {char for veggie in veggies for char in veggie}
{'c', 'u', 't', 'o', 'p', 'b', 'l', 'i', 'a', 'e', 'm', 'r'}
I’ll be writing a thorough Pydon’t about all types of comprehensions that Python supports, so be sure to
subscribe to the newsletter in order to not miss that upcoming Pydon’t!
This book is a WIP. Check online to get updates for free. 276
>>> groceries.add("cheese")
>>> groceries
{'milk', 'cheese', 'chocolate'}
Iteration
I often relate sets with lists (and tuples). Sets are similar to lists with unique elements, but lists are ordered:
a list can be traversed from the beginning to the end, and a list can be indexed.
While sets can also be iterated over (in an order you can’t rely on),
>>> for item in groceries:
... print(item)
...
cheese
milk
chocolate
sets cannot be indexed directly:
>>> groceries[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable
This book is a WIP. Check online to get updates for free. 277
>>> {"cheese", "milk"} < groceries
True
>>> groceries < groceries
False
>>> {"cheese", "milk"} <= groceries
True
>>> groceries <= groceries
True
>>> treats > {"chocolate"}
True
>>> treats >= {"chocolate", "cheese"}
False
Notice that most of the operator-based operations have corresponding method calls. The corresponding
method calls can accept an arbitrary iterator, whereas the operator-based versions expect sets.
Mutability
Sets are mutable. Sets are said to be mutable because they can change, that’s what “mutable” means in
English.
As I showed you above, the contents of sets can change, for example through calls to the methods .add and
.pop.
However, if you need to create an object that behaves like a set, (i.e. where order doesn’t matter and where
uniqueness is guaranteed) but that you don’t want to be changed, then you want to create a frozenset.
An instance of a frozenset is pretty much like a set, except that frozenset isn’t mutable. In other words,
a frozenset is immutable, it can’t be mutated, it was frozen.
To create a frozenset, you just call the appropriate class:
This book is a WIP. Check online to get updates for free. 278
>>> groceries_ = frozenset(groceries)
>>> # Can't add items:
>>> groceries_.add("beans")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
>>> # Can't pop items:
>>> groceries_.pop()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'pop'
There’s a very similar pair of built-in types that have this same dichotomy: lists and tuples. Lists are mutable
(they have the methods .append and .pop, for example) whereas tuples are immutable (the don’t have the
methods .append or .pop, nor can you assign directly to indices):
## Lists are mutable:
>>> l = [0, 1, 2]
>>> l[0] = 73
>>> l.pop()
2
>>> l.append(42)
>>> l
[73, 1, 42]
To be (hashable) or not to be
An object that is hashable is an object for which a hash can be computed, hence, hash-able.
A hash is an integer that the built-in function hash computes to help with fast operations with dictionaries,
e.g. key lookups.
The built-in function knows how to work with some types of objects, and not with others. The built-in function
hash dictates what can and cannot be a dictionary key: if it is hashable, it can be a dictionary key; if it isn’t
This book is a WIP. Check online to get updates for free. 279
hashable, it cannot be a dictionary key.
For example, lists are mutable and unhashable, and hence they cannot be dictionary keys. Attempting to use
a list as a dictionary key raises an error:
>>> d = {}
>>> d[[1, 2, 3]] = 73
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
However, the tuple – list’s sibling – is immutable, and immutable objects can generally be made hashable.
A tuple can be used as a dictionary key:
>>> d = {}
>>> d[(1, 2, 3)] = 73
>>> d
{(1, 2, 3): 73}
Similarly, because sets are mutable, they cannot be hashable. However, frozensets are not mutable, and
they are also hashable! A set cannot be a dictionary key, but a frozenset can:
>>> d = {}
>>> d[groceries] = 73
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>> d[frozenset(groceries)] = 73
>>> d
{frozenset({'cheese', 'milk', 'chocolate'}): 73}
This book is a WIP. Check online to get updates for free. 280
— Rodrigo �� (@mathsppblog) September 1, 2021
These properties will be the main rationale followed by the programmers that wrote the pieces of code I will
be showing you, showcasing good usages of set.
Examples in code
The examples that follow are my attempts at showing you good usages of the built-in types set and
frozenset.
seen_actions = set()
seen_non_default_actions = set()
This book is a WIP. Check online to get updates for free. 281
When you create a command line application with argparse, you have to specify the options that your com-
mand takes. For example, -v for verbose output or -h to display the help message.
Sometimes, there may be conflicting options. For example, if you provide -v for verbose output, and also -q
for quiet output, then it won’t make sense to specify both at the same time.
The action_conflicts dictionary will keep track of what things conflict with what.
Later, we initialise two empty sets, seen_actions and seen_non_default_actions. Now, every time we see
an action, we add it to the set that contains all actions that have been seen.
Then, if that action was really specified by the user, we add it to the set of actions that didn’t have the default
value.
Finally, we access the action_conflicts to get a list of all the actions that are incompatible with the action
we are parsing now. If any conflicting action shows up in the set of actions we already saw previously, then
we throw an error!
Later down the road, we can also find the following:
## In Lib/argparse.py from Python 3.9.2
class ArgumentParser(_AttributeHolder, _ActionsContainer):
# ...
def _parse_known_args(self, arg_strings, namespace):
# ...
seen_actions = set()
seen_non_default_actions = set()
# ...
# make sure all required actions were present and also convert
# action defaults which were not given as arguments
required_actions = []
for action in self._actions:
if action not in seen_actions:
if action.required:
required_actions.append(_get_action_name(action))
# ...
if required_actions:
self.error(_('the following arguments are required: %s') %
', '.join(required_actions))
Once more, we are using the set seen_actions for fast membership checking: we traverse all the actions
that the command line interface knows about, and we keep track of all the required actions that the user
didn’t specify/mention.
This book is a WIP. Check online to get updates for free. 282
After that, if there are any actions in the list required_actions, then we let the user know that they forgot
some things.
parser = argparse.ArgumentParser()
parser.add_argument("-t", action="store_true")
args = parser.parse_args()
Now open your terminal in the directory where foo.py lives:
> python foo.py -ttt
_StoreTrueAction(option_strings=['-t'], dest='t', nargs=0, const=True, default=False, type=None, choices
_StoreTrueAction(option_strings=['-t'], dest='t', nargs=0, const=True, default=False, type=None, choices
_StoreTrueAction(option_strings=['-t'], dest='t', nargs=0, const=True, default=False, type=None, choices
You get three lines of identical output, one per each time you typed a t in the command.
This book is a WIP. Check online to get updates for free. 283
So, we see that we have duplicate actions showing up… Shouldn’t we check if an action has been added
before adding it? Something like
## In Lib/argparse.py from Python 3.9.2
class ArgumentParser(_AttributeHolder, _ActionsContainer):
# ...
def _parse_known_args(self, arg_strings, namespace):
# ...
def take_action(action, argument_strings, option_string=None):
if action not in seen_actions:
seen_actions.add(action)
# ...
No! Don’t do that! This is an anti-pattern and is repeating unnecessary work! Checking if an element is
inside a set or adding it unconditionally is almost the same work, so checking if it is there and then adding
it is going to double the work you do for all new actions!
The set already handles uniqueness for you, so you don’t have to be worried enforcing it. In that sense, this
is a great example usage of sets.
import string
## ...
## lookup table for whether 7-bit ASCII chars are valid in a Python identifier
_IS_ASCII_ID_CHAR = [(chr(x) in _ASCII_ID_CHARS) for x in range(128)]
## lookup table for whether 7-bit ASCII chars are valid as the first
## char in a Python identifier
_IS_ASCII_ID_FIRST_CHAR = \
[(chr(x) in _ASCII_ID_FIRST_CHARS) for x in range(128)]
Granted, the snippet above does not tell you what the variables _IS_ASCII_ID_CHAR and _IS_ASCII_ID_FIRST_CHAR
are for, but it is quite clear that those two are being built through a list comprehension that does membership
This book is a WIP. Check online to get updates for free. 284
checking on _ASCII_ID_CHARS and _ASCII_ID_FIRST_CHARS. In turn, these two variables are frozensets
of characters!
So there you have it! One more usage of sets for fast membership checking.
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Use (frozen) sets when you are dealing with collections and where what matters is (fast) mem-
bership checking.”
This Pydon’t showed you that:
• sets are (mathematical) objects that contain elements;
– the elements are unique; and
– their ordering doesn’t matter.
• the built-in type set provides an implementation for the mathematical concept of set;
• the frozenset is an immutable and hashable version of set;
• tuples are to lists like frozen sets are to sets;
• you can create sets with
– {} enclosing a comma-separated list of items;
– set() and an iterable; and
– set comprehensions.
• sets have operations that allow to mutate them (like .add and .append), among many others;
• you can combine sets in many different ways, with operators like & and |;
• you can check for set containment with <, <=, >=, >;
• you should use frozenset if you know the collection of objects won’t change;
• (frozen) sets are often used for fast membership checking; and
• unconditionally adding to a set is faster than checking for membership first and adding latter.
If you liked this Pydon’t be sure to leave a reaction below and share this with your friends and fellow Pythonis-
tas. Also, subscribe to the newsletter so you don’t miss a single Pydon’t!
References
• Python 3 Docs, The Python Standard Library, Built-in Types, Set Types � set, frozenset, https://docs
.python.org/3/library/stdtypes.html#set-types-set-frozenset [last accessed 14-09-2021];
• Python 3 Docs, The Python Standard Library, Built-in Functions, hash, https://docs.python.org/3/librar
y/functions.html#hash [last accessed 14-09-2021];
• Python 3 Docs, The Python Language Reference, Special method names, object.__hash__, https:
//docs.python.org/3/reference/datamodel.html#object.__hash__ [last accessed 14-09-2021];
• Python 3 Docs, The Python Standard Library, argparse, https://docs.python.org/3/library/argparse.html
[last accessed 14-09-2021];
This book is a WIP. Check online to get updates for free. 285
List comprehensions 101
Introduction
List comprehensions are, hands down, one of my favourite Python features.
It’s not THE favourite feature, but that’s because Python has a lot of things I really like! List comprehensions
being one of those.
This Pydon’t (the first in a short series) will cover the basics of list comprehensions.
286
In this Pydon’t, you will:
• learn the anatomy of a list comprehension;
– learn how to create list comprehensions; and
– understand the building blocks of list comprehensions;
• see the parallel that exists between some for loops and list comprehensions;
• establish a correspondence between map and filter, and list comprehensions;
• understand the main use-case for this feature; and
• see good usages of list comprehensions in real code written by real people.
I also summarised the contents of this Pydon’t in a cheatsheet that you can get for free from here.
This book is a WIP. Check online to get updates for free. 287
The first part is a set of opening and closing brackets, that delimit the list comprehension. The brackets, by
themselves, do not automatically indicate a list comprehension, because they can also be used to create list
literal, like [1, 2, 3].
The second part is the expression that you apply to each element of the initial seed data you are using. This
is often a function call or another expression, like an arithmetic expression, that transforms each element
into a new one. In the diagram above, this is represented by func(elem).
The third part is the for component that establishes what the initial data is, and where we are going to draw
our elements from. This is akin to the for ... in ... of a standard for loop. In fact, it looks exactly the
same as the initial statement of a for loop, and it is represented by for elem in iterable in the diagram
above.
The fourth part, which is optional, is an if statement. This if statement is used to filter elements from the
initial seed data, in case we want to ignore some of it/only use part of the data. This is represented by the
cond(elem) above.
Enough of theoretical gibberish, let’s look at some actual list comprehensions.
This book is a WIP. Check online to get updates for free. 288
2. Uppercase a series of words if they are all lower case:
>>> words = "This is Sparta!".split()
>>> [word.upper() for word in words if word == word.lower()]
['IS']
The result only contains the word “is” because that’s the only word that was entirely lower case in the original
sentence “This is Sparta!”.
3. Find the length of each word that does not have punctuation next to it:
>>> words = "To be or not to be, that is the question.".split()
>>> words
['To', 'be', 'or', 'not', 'to', 'be,', 'that', 'is', 'the', 'question.']
>>> [len(word) for word in words if word.isalpha()]
[2, 2, 2, 3, 2, 4, 2, 3]
The final result only contains 8 numbers (while the original sentence contains 10 words) because the words
“be,” and “question.” had punctuation next to them.
This book is a WIP. Check online to get updates for free. 289
This is, in fact, one of the most common patterns that list comprehensions are useful for.
If you find a piece of code that initialises an empty list, and then uses a for loop to populate it with data,
that’s probably a good use case for a list comprehension.
Of course this isn’t always doable in a sensible way, list comprehensions are not meant to replace all for
loops. But if you have a short loop exhibiting the structure above, then that could probably be replaced by a
list comprehension.
I challenge you to do just that. Go through some code of yours and look for that pattern. Then, try replacing
it with a list comprehension.
Here are the list comprehensions from before, with the equivalent for loops:
1. Squaring:
even_squares = [n ** 2 for n in range(10) if n % 2 == 0]
## ↑
## ↓
even_squares = []
for n in range(10):
if n % 2 == 0:
even_squares.append(n ** 2)
2. Upper casing words:
words = "This is Sparta!".split()
This book is a WIP. Check online to get updates for free. 290
upper_cased = [word.upper() for word in words if word == word.lower()]
## ↑
## ↓
upper_cased = []
for word in words:
if word == word.lower():
upper_cased.append(word.upper())
3. Finding length of words:
words = "To be or not to be, that is the question.".split()
lengths = [len(word) for word in words if word.isalpha()]
## ↑
## ↓
lengths = []
for word in words:
if word.isalpha():
lengths.append(len(word))
This book is a WIP. Check online to get updates for free. 291
>>> lists = [[1, 2, 3], [4, 5, 6, 7], [8, 9]]
>>> [elem for sublist in lists for elem in sublist]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
The second loop doesn’t need to depend explicitly on the first one; it can iterate over another iterable, to
create another variable. When you do so, all the temporary variables that are going through iterables become
available to be used on the left.
This pattern arises naturally when you want to combine information from two or more data sources:
>>> colours = ["red", "green", "blue"]
>>> clothes = ["t-shirt", "shirt"]
>>> [f"{colour} {clothing}" for colour in colours for clothing in clothes]
['red t-shirt', 'red shirt', 'green t-shirt', 'green shirt', 'blue t-shirt', 'blue shirt']
Notice that, in here, to “combine” the information means to create all different pairings with the data from
one and the other iterable. If you want to create pairings by traversing two iterables in parallel, then you
should read up on zip.
Nesting if statements
Much like you can nest for loops to iterate over more iterables, you can nest if statements to create stricter
filters.
When you have a series of if statements, the second condition only runs if the first one passed; the third
This book is a WIP. Check online to get updates for free. 292
condition only runs if the second one passed; and so on.
However, with the if statements, this is the same as combining the successive conditions with ands. That’s
because Boolean short-circuiting makes sure that later conditions only get evaluated if the earlier ones
evaluated to True.
This means there is a series of equivalences when we think about list comprehensions with nested if state-
ments:
Arbitrary nesting
The two sections above showed you that you can nest multiple for loops, and also multiple if statements.
Now, the only thing left for you to know is that these can be mixed and nested arbitrarily. Of course, you
should not nest things too much, because a long list comprehension is harder to read than the equivalent
nested formulation.
The diagram below helps you in understanding the correspondence between the order of things in the nested
formulation and the left-to-right ordering of things in the list comprehension.
The further you are to the right in a list comprehension, the deeper you are in the equivalent nested formu-
lation:
This book is a WIP. Check online to get updates for free. 293
List comprehensions instead of map and filter
List comprehensions are often deemed a more Pythonic replacement for calls to the built-in functions map
and filter.
map takes a function and applies it to all elements of an iterable, and that’s straightforward to do with a list
comprehension:
Similarly, the built-in filter can often be replaced with a more Pythonic list comprehension.
This book is a WIP. Check online to get updates for free. 294
Please, bear in mind that the list comprehension versions of map and filter are not equivalent to using map
and filter. The underlying data is, but the containers themselves are slightly different.
Not only that, but I’m also not saying that map and filter are useless. A later Pydon’t will be devoted to
understanding when to use map and filter, so make sure to subscribe to the newsletter to not miss that
Pydon’t.
Examples in code
Random data
A neat little example of where a list comprehension is the way to go, is when generating some random data.
For example, to generate three integers to represent an RGB colour,
>>> from random import randint
>>> r, g, b = [randint(0, 255) for _ in range(3)]
>>> r
180
>>> g
148
>>> b
188
or when generating a random string:
This book is a WIP. Check online to get updates for free. 295
>>> from string import ascii_lowercase, ascii_uppercase
>>> from random import choice
>>> "".join([choice(ascii_lowercase + ascii_uppercase) for _ in range(16)])
'qMQlkhvKJfdZGBEZ'
# ...
Looking at the code above, we can see that the list service_prefixes is being created and then appended
to in the for loop; also, that’s the only purpose of that for loop.
This is the generic pattern that indicates a list comprehension might be useful!
Therefore, we can replace the loop with a list comprehension. The variable count is superfluous because it
keeps track of the length of the resulting list, something we can find out easily with the function len.
Here is a possible alternative using a list comprehension:
def get_service_prefixes(amazon_service):
service_prefixes = [
prefix for prefix in get_aws_prefixes()
if amazon_service in prefix["service"]
]
count = len(service_prefixes)
# ...
This book is a WIP. Check online to get updates for free. 296
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“List comprehensions are a powerful Python feature that is useful for building lists.”
This Pydon’t was also summarised in a free cheatsheet:
This book is a WIP. Check online to get updates for free. 297
If you liked this Pydon’t be sure to leave a reaction below and share this with your friends and fellow Pythonis-
tas. Also, subscribe to the newsletter so you don’t miss a single Pydon’t!
References
• Python 3 Docs, The Python Tutorial, Data Structures, More on Lists, List Comprehensions https://docs
.python.org/3/tutorial/datastructures.html#list-comprehensions [last accessed on 24-09-2021];
• Python 3 Docs, The Python Standard Library, Built-in Functions, filter, https://docs.python.org/3/libr
ary/functions.html#filter [last accessed 22-09-2021];
• Python 3 Docs, The Python Standard Library, Built-in Functions, map, https://docs.python.org/3/library/
functions.html#map [last accessed 22-09-2021];
This book is a WIP. Check online to get updates for free. 298
Conditional expressions
Introduction
Conditional expressions are what Python has closest to what is called a “ternary operator” in other languages.
In this Pydon’t, you will:
• learn about the syntax of conditional expressions;
• understand the rationale behind conditional expressions;
• learn about the precedence of conditional expressions;
• see how to nest conditional expressions;
• understand the relationship between if: ... elif: ... else: statements and conditional expres-
sions;
• see good and bad example usages of conditional expressions;
299
What is a conditional expression?
A conditional expression in Python is an expression (in other words, a piece of code that evaluates to a result)
whose value depends on a condition.
Conditions
We are very used to using if statements to run pieces of code when certain conditions are met. Rewording
that, a condition can dictate what piece(s) of code run.
In conditional expressions, we will use a condition to change the value to which the expression evaluates.
Wait, isn’t this the same as an if statement? No! Statements and expressions are not the same thing.
Syntax
Instead of beating around the bush, let me just show you the anatomy of a conditional expression:
expr_if_true if condition else expr_if_false
A conditional expression is composed of three sub-expressions and the keywords if and else. None of
these components are optional. All of them have to be present.
How does this work?
This book is a WIP. Check online to get updates for free. 300
First, condition is evaluated. Then, depending on whether condition evaluates to Truthy or Falsy, the
expression evaluates expr_if_true or expr_if_false, respectively.
As you may be guessing from the names, expr_if_true and expr_if_false can themselves be expressions.
This means they can be simple literal values like 42 or "spam", or other “complicated” expressions.
(Heck, the expressions in conditional expressions can even be other conditional expressions! Keep reading
for that �)
2.
>>> 42 if False else 0
0
3.
>>> "Mathspp".lower() if pow(3, 27, 10) > 5 else "Oh boy."
'mathspp'
For reference:
>>> pow(3, 27, 10)
7
This book is a WIP. Check online to get updates for free. 301
Reading a conditional expression
While the conditional expression presents the operands in an order that may throw some of you off, it is easy
to read it as an English sentence.
Take this reference conditional expression:
value if condition else other_value
Here are two possible English “translations” of the conditional expression:
“Evaluate to value if condition is true, otherwise evaluate to other_value.”
or
“Give value if condition is true and other_value otherwise.”
With this out of the way, …
Rationale
The rationale behind conditional expressions is simple to understand: programmers are often faced with a
situation where they have to pick one of two values.
That’s just it.
Whenever you find yourself having to choose between one value or another, typically inside an if: ... else:
... block, that might be a good use-case for a conditional expression.
>>> parity(15)
"odd"
This book is a WIP. Check online to get updates for free. 302
>>> parity(42)
"even"
2. computing the absolute value of a number (this already exists as a built-in function):
def abs(x):
if x > 0:
return x
else:
return -x
>>> abs(10)
10
>>> abs(-42)
42
These two functions have a structure that is very similar: they check a condition and return a given value if
the condition evaluates to True. If it doesn’t, they return a different value.
Refactored examples
Can you refactor the functions above to use conditional expressions? Here is one possible refactoring for
each:
def parity(n):
return "odd" if n % 2 else "even"
This function now reads as
“return "odd" if n leaves remainder when divided by 2 and "even" otherwise.”
As for the absolute value function,
def abs(n):
return x if x > 0 else -x
it now reads as
“return x if x is positive, otherwise return -x.”
Short-circuiting
You may be familiar with Boolean short-circuiting, in which case you might be pleased to know that conditional
expressions also short-circuit.
For those of you who don’t know Boolean short-circuiting yet, I can recommend my thorough Pydon’t art-
icle on the subject. Either way, it’s something to understand for our conditional expressions: a conditional
expression will only evaluate what it really has to.
In other words, if your conditional expression looks like
expr_if_true if condition else expr_if_false
This book is a WIP. Check online to get updates for free. 303
then only one of expr_if_true and expr_if_false is ever evaluated. This might look silly to point out, but
is actually quite important.
Some times, we might want to do something (expr_if_true) that only works if a certain condition is met.
For example, say we want to implement the quad-UCS function from APL. That function is simple to explain:
it converts integers into characters and characters into integers. In Python-speak, it just uses chr and ord,
whatever makes sense on the input.
Here is an example implementation:
def ucs(x):
if isinstance(x, int):
return chr(x)
else:
return ord(x)
>>> ucs("A")
65
>>> ucs(65)
'A
>>> ucs(102)
'f'
>>> ucs("f")
102
What isn’t clear from this piece of code is that ord throws an error when called on integers, and chr fails
when called on characters:
>>> ord(65)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ord() expected string of length 1, but int found
>>> chr("f")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: an integer is required (got type str)
Thankfully, this is not a problem for conditional expressions, and therefore ucs can be implemented with
one:
def ucs(x):
return chr(x) if isinstance(x, int) else ord(x)
>>> ucs("A")
65
>>> ucs(65)
'A
>>> ucs(102)
This book is a WIP. Check online to get updates for free. 304
'f'
>>> ucs("f")
102
Therefore, we see that when x is an integer, ord(x) never runs. On the flip side, when x is not an integer,
chr(x) never runs. This is a very useful subtlety!
>>> sign(-73)
-1
>>> sign(0)
0
>>> sign(42)
1
This book is a WIP. Check online to get updates for free. 305
How can we write this as a conditional expression? Conditional expressions do not allow the usage of the
elif keyword so, instead, we start by reworking the if block itself:
def sign(x):
if x == 0:
return 0
else:
if x > 0:
return 1
else:
return -1
This isn’t a great implementation, but this intermediate representation makes it clearer that the bottom of
the if block can be replaced with a conditional expression:
def sign(x):
if x == 0:
return 0
else:
return 1 if x > 0 else -1
Now, if we abstract away from the fact that the second return value is a conditional expression itself, we can
rewrite the existing if block as a conditional expression:
def sign(x):
return 0 if x == 0 else (1 if x > 0 else -1)
>>> sign(-73)
-1
>>> sign(0)
0
>>> sign(42)
1
This shows that conditional expressions can be nested, naturally. Now it is just a matter of checking whether
the parenthesis are needed or not.
In other words, if we write
A if B else C if D else E
does Python interpret it as
(A if B else C) if D else E
or does it interpret it as
A if B else (C if D else E)
As it turns out, it’s the latter. So, the sign function above can be rewritten as
def sign(x):
return 0 if x == 0 else 1 if x > 0 else -1
This book is a WIP. Check online to get updates for free. 306
It’s this chain of if ... else ... if ... else ... – that can be arbitrarily long – that emulates elifs.
To convert from a long if block (with or without elifs) to a conditional expression, go from top to bottom
and interleave values and conditions, alternating between the keyword if and the keyword else.
When reading this aloud in English, the word “otherwise” helps clarify what the longer conditional expressions
mean:
return 0 if x == 0 else 1 if x > 0 else -1
reads as
“return 0 if x is 0, otherwise, return 1 if x is positive otherwise return -1.”
The repetition of the word “otherwise” becomes cumbersome, a good indicator that it is generally not a good
idea to get carried away and chaining several conditional expressions.
For reference, here’s a “side-by-side” comparison of the first conditional block and the final conditional
expression:
## Compare
if x == 0:
return 0
elif x > 0:
return 1
else:
return -1
## to:
return 0 if x == 0 else 1 if x > 0 else -1
This book is a WIP. Check online to get updates for free. 307
Hence, we can’t use this cond to implement ucs:
def ucs(x):
return cond(isinstance(x, int), chr(x), ord(x))
This code looks sane, but it won’t behave like we would like:
>>> ucs(65)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in ucs
TypeError: ord() expected string of length 1, but int found
When given 65, the first argument evaluates to True, and the second argument evaluates to "A", but the third
argument raises an error!
Precedence
Conditional expressions are the expressions with lowest precedence, according to the documentation.
This means that sometimes you may need to parenthesise a conditional expression if you are using it inside
another expression.
For example, take a look at this function:
def foo(n, b):
if b:
to_add = 10
else:
to_add = -10
return n + to_add
This book is a WIP. Check online to get updates for free. 308
def foo(n, b):
return n + 10 if b else -10
By doing this, you suddenly break the function when b is False:
>>> foo(42, False)
-10
That’s because the expression
n + 10 if b else -10
is seen by Python as
(n + 10) if b else -10
while you meant for it to mean
n + (10 if b else -10)
In other words, and in not-so-rigourous terms, the + “pulled” the neighbouring 10 and it’s the whole n + 10
that is seen as the expression to evaluate if the condition evaluates to Truthy.
This book is a WIP. Check online to get updates for free. 309
n > pow(10, 10) evaluates to… We return…
True True
False False
So, if the value of n > pow(10, 10) is the same as the thing we return, why don’t we just return n > pow(10,
10)? In fact, that’s what we should do:
def is_huge(n):
return n > pow(10, 10)
Take this with you: never use if: ... else: ... or conditional expressions to evaluate to/return Boolean
values. Often, it suffices to work with the condition alone.
A related use case where conditional expressions shouldn’t be used is when assigning default values to
variables. Some of these default values can be assigned with Boolean short-circuiting, using the or operator.
Examples in code
Here are a couple of examples where conditional expressions shine.
You will notice that these examples aren’t particularly complicated or require much context to understand
the mechanics of what is happening.
That’s because the rationale behind conditional expressions is simple: pick between two values.
This book is a WIP. Check online to get updates for free. 310
Here is the full implementation of the .get method:
## From Lib/collections/__init__.py in Python 3.9.2
class ChainMap(_collections_abc.MutableMapping):
# ...
Resolving paths
The module pathlib is great when you need to deal with paths. One of the functionalities provided is the
.resolve method, that takes a path and makes it absolute, getting rid of symlinks along the way:
## Running this from C:/Users/rodri:
>>> Path("..").resolve()
WindowsPath('C:/Users')
## The current working directory is irrelevant here:
>>> Path("C:/Users").resolve()
WindowsPath('C:/Users')
Here is part of the code that resolves paths:
## In Lib/pathlib.py from Python 3.9.2
class _PosixFlavour(_Flavour):
# ...
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
This book is a WIP. Check online to get updates for free. 311
“Conditional expressions excel at evaluating to one of two distinct values, depending on the value
of a condition.”
This Pydon’t showed you that:
• conditional expressions are composed of three sub-expressions interleaved with the if and else
keywords;
• conditional expressions were created with the intent of providing a convenient way of choosing between
two values depending on a condition;
• conditional expressions can be easily read out as English statements;
• conditional expressions have the lowest precedence of all Python expressions;
• short-circuiting ensures conditional expressions only evaluate one of the two “value expressions”;
• conditional expressions can be chained together to emulate if: ... elif: ... else: ... blocks;
• it is impossible to emulate a conditional expression with a function; and
• if your conditional expression evaluates to a Boolean, then you should only be working with the condi-
tion.
If you liked this Pydon’t be sure to leave a reaction below and share this with your friends and fellow Pythonis-
tas. Also, subscribe to the newsletter so you don’t miss a single Pydon’t!
References
• Python 3 Docs, The Python Language Reference, Conditional Expressions, https://docs.python.org/3/
reference/expressions.html#conditional-expressions [last accessed 28-09-2021];
• PEP 308 – Conditional Expressions, https://www.python.org/dev/peps/pep-0308/ [last accessed
28-09-2021];
• Python 3 Docs, The Python Standard Library, collections.ChainMap, https://docs.python.org/3/librar
y/collections.html#chainmap-objects [last accessed 28-09-2021];
• Python 3 Docs, The Python Standard Library, pathlib.Path.resolve, https://docs.python.org/3/librar
y/pathlib.html#pathlib.Path.resolve [last accessed 28-09-2021];
• “Does Python have a ternary conditional operator?”, Stack Overflow question and answers, https://stac
koverflow.com/questions/394809/does-python-have-a-ternary-conditional-operator [last accessed
28-09-2021];
• “Conditional Expressions in Python”, note.nkmk.me, https://note.nkmk.me/en/python-if-conditional-
expressions/ [last accessed 28-09-2021];
• “Conditional Statements in Python, Conditional Expressions (Python’s Ternary Operator)”, Real Python,
https://realpython.com/python-conditional-statements/#conditional-expressions-pythons-ternary-
operator [last accessed 28-09-2021];
This book is a WIP. Check online to get updates for free. 312
313
Pass-by-value, reference, and
assignment
This book is a WIP. Check online to get updates for free. 314
(Thumbnail of the original article at https://mathspp.com/blog/pydonts/pass-by-value-reference-and-
assignment.)
Introduction
Many traditional programming languages employ either one of two models when passing arguments to func-
tions:
• some languages use the pass-by-value model; and
• most of the others use the pass-by-reference model.
Having said that, it is important to know the model that Python uses, because that influences the way your
code behaves.
In this Pydon’t, you will:
• see that Python doesn’t use the pass-by-value nor the pass-by-reference models;
• understand that Python uses a pass-by-assignment model;
• learn about the built-in function id;
• create a better understanding for the Python object model;
• realise that every object has 3 very important properties that define it;
• understand the difference between mutable and immutable objects;
• learn the difference between shallow and deep copies; and
• learn how to use the module copy to do both types of object copies.
Is Python pass-by-value?
In the pass-by-value model, when you call a function with a set of arguments, the data is copied into the
function. This means that you can modify the arguments however you please and that you won’t be able to
alter the state of the program outside the function. This is not what Python does, Python does not use the
pass-by-value model.
Looking at the snippet of code that follows, it might look like Python uses pass-by-value:
def foo(x):
x = 4
a = 3
foo(a)
print(a)
## 3
This looks like the pass-by-value model because we gave it a 3, changed it to a 4, and the change wasn’t
reflected on the outside (a is still 3).
But, in fact, Python is not copying the data into the function.
To prove this, I’ll show you a different function:
This book is a WIP. Check online to get updates for free. 315
def clearly_not_pass_by_value(my_list):
my_list[0] = 42
l = [1, 2, 3]
clearly_not_pass_by_value(l)
print(l)
## [42, 2, 3]
As we can see, the list l, that was defined outside of the function, changed after calling the function
clearly_not_pass_by_value. Hence, Python does not use a pass-by-value model.
Is Python pass-by-reference?
In a true pass-by-reference model, the called function gets access to the variables of the callee! Sometimes,
it can look like that’s what Python does, but Python does not use the pass-by-reference model.
I’ll do my best to explain why that’s not what Python does:
def not_pass_by_reference(my_list):
my_list = [42, 73, 0]
l = [1, 2, 3]
not_pass_by_reference(l)
print(l)
## [1, 2, 3]
If Python used a pass-by-reference model, the function would’ve managed to completely change the value
of l outside the function, but that’s not what happened, as we can see.
Let me show you an actual pass-by-reference situation.
Here’s some Pascal code:
program callByReference;
var
x: integer;
begin
x := 2; { assign 2 to `x` }
writeln(x); { print `x` }
foo(x); { call `foo` with `x` }
writeln(x); { print `x` }
end.
This book is a WIP. Check online to get updates for free. 316
Look at the last lines of that code:
• we assign 2 to x with x := 2;
• we print x;
• we call foo with x as argument; and
• we print x again.
What’s the output of this program?
I imagine that most of you won’t have a Pascal interpreter lying around, so you can just go to tio.run and run
this code online
If you run this, you’ll see that the output is
2
6
which can be rather surprising, if the majority of your programming experience is in Python!
The procedure foo effectively received the variable x and changed the value that it contained. After foo was
done, the variable x (that lives outside foo) had a different value. You can’t do anything like this in Python.
This book is a WIP. Check online to get updates for free. 317
(Im)mutability
The (im)mutability of an object depends on its type. In other words, (im)mutability is a characteristic of types,
not of specific objects!
But what exactly does it mean for an object to be mutable? Or for an object to be immutable?
Recall that an object is characterised by its identity, its type, and its contents. A type is mutable if you can
change the contents of its objects without changing its identity and its type.
Lists are a great example of a mutable data type. Why? Because lists are containers: you can put things
inside lists and you can remove stuff from inside those same lists.
Below, you can see how the contents of the list obj change as we make method calls, but the identity of the
list remains the same:
>>> obj = []
>>> id(obj)
2287844221184
>>> obj.append(0); obj.extend([1, 2, 3]); obj
[42, 0, 1, 2, 3]
>>> id(obj)
2287844221184
>>> obj.pop(0); obj.pop(0); obj.pop(); obj
42
0
3
[1, 2]
>>> id(obj)
2287844221184
However, when dealing with immutable objects, it’s a completely different story. If we check an English
dictionary, this is what we get for the definition of “immutable”:
adjective: immutable – unchanging over time or unable to be changed.
Immutable objects’ contents never change. Take a string as an example:
>>> obj = "Hello, world!"
Strings are a good example for this discussion because, sometimes, they can look mutable. But they are not!
A very good indicator that an object is immutable is when all its methods return something. This is unlike
list’s .append method, for example! If you use .append on a list, you get no return value. On the other hand,
whatever method you use on a string, the result is returned to you:
>>> [].append(0) # No return.
>>> obj.upper() # A string is returned.
'HELLO, WORLD!"
Notice how obj wasn’t updated automatically to "HELLO, WORLD!". Instead, the new string was created and
returned to you.
This book is a WIP. Check online to get updates for free. 318
Another great hint at the fact that strings are immutable is that you cannot assign to its indices:
>>> obj[0]
'H'
>>> obj[0] = "h"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
This shows that, when a string is created, it remains the same. It can be used to build other strings, but the
string itself always. stays. unchanged.
As a reference, int, float, bool, str, tuple, and complex are the most common types of immutable objects;
list, set, and dict are the most common types of mutable objects.
The operator is
This is exactly what the operator is does: it checks if the two objects are the same.
For two objects to be the same, they must have the same identity:
>>> foo is obj
True
>>> bar is foo
True
This book is a WIP. Check online to get updates for free. 319
>>> obj is foo
True
It is not enough to have the same type and contents! We can create a new list with contents [1, 2, 3] and
that will not be the same object as obj:
>>> obj is [1, 2, 3]
False
Think of it in terms of perfect twins. When two siblings are perfect twins, they look identical. However, they
are different people!
is not
Just as a side note, but an important one, you should be aware of the operator is not.
Generally speaking, when you want to negate a condition, you put a not in front of it:
n = 5
if not isinstance(n, str):
print("n is not a string.")
## n is not a string.
So, if you wanted to check if two variables point to different objects, you could be tempted to write
if not a is b:
print("`a` and `b` are different objets.")
However, Python has the operator is not, which is much more similar to a proper English sentence, which I
think is really cool!
Therefore, the example above should actually be written
if a is not b:
print("`a` and `b` are different objects.")
Python does a similar thing for the in operator, providing a not in operator as well… How cool is that?!
Assignment as nicknaming
If we keep pushing this metaphor forward, assigning variables is just like giving a new nickname to someone.
My friends from middle school call me “Rojer”. My friends from college call me “Girão”. People I am not
close to call me by my first name – “Rodrigo”. However, regardless of what they call me, I am still me, right?
If one day I decide to change my haircut, everyone will see the new haircut, regardless of what they call me!
In a similar fashion, if I modify the contents of an object, I can use whatever nickname I prefer to see that
those changes happened. For example, we can change the middle element of the list we have been playing
around with:
>>> foo[1] = 42
>>> bar
This book is a WIP. Check online to get updates for free. 320
[1, 42, 3]
>>> baz
[1, 42, 3]
>>> obj
[1, 42, 3]
We used the nickname foo to modify the middle element, but that change is visible from all other nicknames
as well.
Why?
Because they all pointed at the same list object.
Python is pass-by-assignment
Having laid out all of this, we are now ready to understand how Python passes arguments to functions.
When we call a function, each of the parameters of the function is assigned to the object they were passed
in. In essence, each parameter now becomes a new nickname to the objects that were given in.
Immutable arguments
If we pass in immutable arguments, then we have no way of modifying the arguments themselves. After all,
that’s what immutable means: “doesn’t change”.
That is why it can look like Python uses the pass-by-value model. Because the only way in which we can have
the parameter hold something else is by assigning it to a completely different thing. When we do that, we
are reusing the same nickname for a different object:
def foo(bar):
bar = 3
return bar
foo(5)
In the example above, when we call foo with the argument 5, it’s as if we were doing bar = 5 at the beginning
of the function.
Immediately after that, we have bar = 3. This means “take the nickname”bar” and point it at the integer 3”.
Python doesn’t care that bar, as a nickname (as a variable name) had already been used. It is now pointing
at that 3!
Mutable arguments
On the other hand, mutable arguments can be changed. We can modify their internal contents. A prime
example of a mutable object is a list: its elements can change (and so can its length).
That is why it can look like Python uses a pass-by-reference model. However, when we change the contents
of an object, we didn’t change the identity of the object itself. Similarly, when you change your haircut or
your clothes, your social security number does not change:
This book is a WIP. Check online to get updates for free. 321
>>> l = [42, 73, 0]
>>> id(l)
3098903876352
>>> l[0] = -1
>>> l.append(37)
>>> id(l)
3098903876352
Do you understand what I’m trying to say? If not, drop a comment below and I’ll try to help.
Making copies
Shallow vs deep copies
“Copying an object” means creating a second object that has a different identity (therefore, is a different
object) but that has the same contents. Generally speaking, we copy one object so that we can work with it
and mutate it, while also preserving the first object.
When copying objects, there are a couple of nuances that should be discussed.
This book is a WIP. Check online to get updates for free. 322
Or, sometimes, you can just call methods and other functions directly on the original, because the original
is not going anywhere:
string = "Hello, world!"
print(string.lower())
## After calling `.lower`, `string` is still "Hello, world!"
So, we only need to worry about mutable objects.
Shallow copy
Many mutable objects can contain, themselves, mutable objects. Because of that, two types of copies exist:
• shallow copies; and
• deep copies.
The difference lies in what happens to the mutable objects inside the mutable objects.
Lists and dictionaries have a method .copy that returns a shallow copy of the corresponding object.
Let’s look at an example with a list:
>>> sublist = []
>>> outer_list = [42, 73, sublist]
>>> copy_list = outer_list.copy()
First, we create a list inside a list, and we copy the outer list. Now, because it is a copy, the copied list isn’t
the same object as the original outer list:
>>> copy_list is outer_list
False
But if they are not the same object, then we can modify the contents of one of the lists, and the other won’t
reflect the change:
>>> copy_list[0] = 0
>>> outer_list
[42, 73, []]
That’s what we saw: we changed the first element of the copy_list, and the outer_list remained un-
changed.
Now, we try to modify the contents of sublist, and that’s when the fun begins!
>>> sublist.append(999)
>>> copy_list
[0, 73, [999]]
>>> outer_list
[42, 73, [999]]
When we modify the contents of sublist, both the outer_list and the copy_list reflect those changes…
But wasn’t the copy supposed to give me a second list that I could change without affecting the first one?
Yes! And that is what happened!
This book is a WIP. Check online to get updates for free. 323
In fact, modifying the contents of sublist doesn’t really modify the contents of neither copy_list nor
outer_list: after all, the third element of both was pointing at a list object, and it still is! It’s the (inner)
contents of the object to which we are pointing that changed.
Sometimes, we don’t want this to happen: sometimes, we don’t want mutable objects to share inner mutable
objects.
Deep copy
When you want to copy an object “thoroughly”, and you don’t want the copy to share references to inner
objects, you need to do a “deep copy” of your object. You can think of a deep copy as a recursive algorithm.
You copy the elements of the first level and, whenever you find a mutable element on the first level, you
recurse down and copy the contents of those elements.
To show this idea, here is a simple recursive implementation of a deep copy for lists that contain other lists:
def list_deepcopy(l):
return [
elem if not isinstance(elem, list) else list_deepcopy(elem)
for elem in l
]
This book is a WIP. Check online to get updates for free. 324
We can use this function to copy the previous outer_list and see what happens:
>>> sublist = []
>>> outer_list = [42, 73, sublist]
>>> copy_list = list_deepcopy(outer_list)
>>> sublist.append(73)
>>> copy_list
[42, 73, []]
>>> outer_list
[42, 73, [73]]
As you can see here, modifying the contents of sublist only affected outer_list indirectly; it didn’t affect
copy_list.
Sadly, the list_deepcopy method I implemented isn’t very robust, nor versatile, but the Python Standard
Library has got us covered!
Examples in code
Now that we have gone deep into the theory – pun intended –, it is time to show you some actual code that
plays with these concepts.
This book is a WIP. Check online to get updates for free. 325
def my_append(elem, l=[]):
l.append(elem)
return l
The function above appends an element to a list and, if no list is given, appends it to an empty list by default.
Great, let’s put this function to good use:
>>> my_append(1)
[1]
>>> my_append(1, [42, 73])
[42, 73, 1]
>>> my_append(3)
[1, 3]
We use it once with 1, and we get a list with the 1 inside. Then, we use it to append a 1 to another list we
had. And finally, we use it to append a 3 to an empty list… Except that’s not what happens!
As it turns out, when we define a function, the default arguments are created and stored in a special place:
>>> my_append.__defaults__
([1, 3],)
What this means is that the default argument is always the same object. Therefore, because it is a mutable
object, its contents can change over time. That is why, in the code above, __defaults__ shows a list with
two items already.
If we redefine the function, then its __defaults__ shows an empty list:
>>> def my_append(elem, l=[]):
... l.append(elem)
... return l
...
>>> my_append.__defaults__
([],)
This is why, in general, mutable objects shouldn’t be used as default arguments.
The standard practice, in these cases, is to use None and then use Boolean short-circuiting to assign the
default value:
def my_append(elem, l=None):
lst = l or []
lst.append(elem)
return lst
With this implementation, the function now works as expected:
>>> my_append(1)
[1]
>>> my_append(3)
[3]
This book is a WIP. Check online to get updates for free. 326
>>> my_append(3, [42, 73])
[42, 73, 3]
is not None
Searching through the Python Standard Library shows that the is not operator is used a bit over 5,000
times. That’s a lot.
And, by far and large, that operator is almost always followed by None. In fact, is not None appears 3169
times in the standard library!
x is not None does exactly what it’s written: it checks if x is None or not.
Here is a simple example usage of that, from the argparse module to create command line interfaces:
## From Lib/argparse.py from Python 3.9
class HelpFormatter(object):
# ...
class _Section(object):
# ...
def format_help(self):
# format the indented section
if self.parent is not None:
self.formatter._indent()
# ...
Even without a great deal of context, we can see what is happening: when displaying command help for a
given section, we may want to indent it (or not) to show hierarchical dependencies.
If a section’s parent is None, then that section has no parent, and there is no need to indent. In other
words, if a section’s parent is not None, then we want to indent it. Notice how my English matches the code
exactly!
This book is a WIP. Check online to get updates for free. 327
'C:'
## Use list(os.environ.keys()) for a list of your environment variables.
The module http.server provides some classes for basic HTTP servers.
One of those classes, CGIHTTPRequestHandler, implements a HTTP server that can also run CGI scripts and,
in its run_cgi method, it needs to set a bunch of environment variables.
These environment variables are set to give the necessary context for the CGI script that is going to be ran.
However, we don’t want to actually modify the current environment!
So, what we do is create a deep copy of the environment, and then we modify it to our heart’s content!
After we are done, we tell Python to execute the CGI script, and we provide the altered environment as an
argument.
The exact way in which this is done may not be trivial to understand. I, for one, don’t think I could explain it
to you. But that doesn’t mean we can’t infer parts of it:
Here is the code:
## From Lib/http/server.py in Python 3.9
class CGIHTTPRequestHandler(SimpleHTTPRequestHandler):
# ...
def run_cgi(self):
# ...
env = copy.deepcopy(os.environ)
env['SERVER_SOFTWARE'] = self.version_string()
env['SERVER_NAME'] = self.server.server_name
env['GATEWAY_INTERFACE'] = 'CGI/1.1'
# and many more `env` assignments!
# ...
else:
# Non-Unix -- use subprocess
# ...
p = subprocess.Popen(cmdline,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env = env
)
As we can see, we copied the environment and defined some variables. Finally, we created a new subprocess
that gets the modified environment.
This book is a WIP. Check online to get updates for free. 328
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Python uses a pass-by-assignment model, and understanding it requires you to realise all ob-
jects are characterised by an identity number, their type, and their contents.”
This Pydon’t showed you that:
• Python doesn’t use the pass-by-value model, nor the pass-by-reference one;
• Python uses a pass-by-assignment model (using “nicknames”);
• each object is characterised by
– its identity;
– its type; and
– its contents.
• the id function is used to query an object’s identifier;
• the type function is used to query an object’s type;
• the type of an object determines whether it is mutable or immutable;
• shallow copies copy the reference of nested mutable objects;
• deep copies perform copies that allow one object, and its inner elements, to be changed without ever
affecting the copy;
• copy.copy and copy.deepcopy can be used to perform shallow/deep copies; and
• you can implement __copy__ and __deepcopy__ if you want your own objects to be copiable.
See also
If you prefer video content, you can check this YouTube video, which was inspired by this Pydon’t.
If you liked this Pydon’t be sure to leave a reaction below and share this with your friends and fellow Pythonis-
tas. Also, subscribe to the newsletter so you don’t miss a single Pydon’t!
References
• Python 3 Docs, Programming FAQ, “How do I write a function with output parameters (call by refer-
ence)?”, https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-
parameters-call-by-reference [last accessed 04-10-2021];
• Python 3 Docs, The Python Standard Library, copy, https://docs.python.org/3/library/copy.html [last
accessed 05-10-2021];
• effbot.org, “Call by Object” (via “arquivo.pt”), https://arquivo.pt/wayback/20160516131553/http:
//effbot.org/zone/call-by-object.htm [last accessed 04-10-2021];
• effbot.org, “Python Objects” (via “arquivo.pt”), https://arquivo.pt/wayback/20191115002033/http:
//effbot.org/zone/python-objects.htm [last accessed 04-10-2021];
• Robert Heaton, “Is Python pass-by-reference or pass-by-value”, https://robertheaton.com/2014/02/0
9/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/ [last accessed 04-10-2021];
• StackOverflow question and answers, “How do I pass a variable by reference?”, https://stackoverflow.
com/q/986006/2828287 [last accessed 04-10-2021];
This book is a WIP. Check online to get updates for free. 329
• StackOverflow question and answers, “Passing values in Python [duplicate]”, https://stackoverflow.co
m/q/534375/2828287 [last accessed 04-10-2021];
• Twitter thread by [@mathsppblog](https://twitter.com/mathsppblog), https://twitter.com/mathsppblog/
status/1445148566721335298 [last accessed 20-10-2021];
This book is a WIP. Check online to get updates for free. 330
String formatting comparison
Introduction
The Zen of Python says that
“There should be one – and preferably only one – obvious way to do it.”
And yet, there are three main ways of doing string formatting in Python. This Pydon’t will settle the score,
comparing these three methods and helping you decide which one is the obvious one to use in each situation.
331
In this Pydon’t, you will:
• learn about the old C-style formatting with %;
• learn about the string method .format;
• learn about the Python 3.6+ feature of literal string interpolation and f-strings;
• understand the key differences between each type of string formatting; and
• see where each type of string formatting really shines.
## ---
>>> language_rocks("Python")
'Python rocks!'
Great job!
Now, write a function that accepts a programming language name and its (estimated) number of users, and
returns a string saying something along the lines of “<insert language> rocks! Did you know that <insert
language> has around <insert number> users?”.
Can you do it? Recall that you are not supposed to use any string formatting facilities, whatsoever!
Here is a possible solution:
def language_info(language, users_estimate):
return (
language + " rocks! Did you know that " + language +
" has around " + str(users_estimate) + " users?!"
)
## ---
>>> language_info("Python", 10)
'Python rocks! Did you know that Python has around 10 users?!'
Notice how that escalated quite quickly: the purpose of our function is still very simple, and yet we have a
bunch of string concatenations happening all over the place, just because we have some pieces of informa-
tion that we want to merge into the string.
This is what string formatting is for: it’s meant to make your life easier when you need to put information
inside strings.
This book is a WIP. Check online to get updates for free. 332
Three string formatting methods
Now that we’ve established that string formatting is useful, let’s take a look at the three main ways of doing
string formatting in Python.
First, here is how you would refactor the function above:
## Using C-style string formatting:
def language_info_cstyle(language, users_estimate):
return (
"%s rocks! Did you know that %s has around %d users?!" %
(language, language, users_estimate)
)
C-style formatting
The C-style formatting, which is the one that has been around the longer, is characterised by a series of
percent signs ("%") that show up in the template strings.
(By “template strings”, I mean the strings in which we want to fill in the gaps.)
These percent signs indicate the places where the bits of information should go, and the character that
comes next (above, we’ve seen "%s" and "%d") determine how the information being passed in is treated.
Additionally, the way in which you apply the formatting is through the binary operator %: on the left you put
the template string, and on the right you put all the pieces of information you need to pass in.
This book is a WIP. Check online to get updates for free. 333
String method .format
The string method .format is, like the name suggests, a method of the string type. This means that you
typically have a format string and, when you get access to the missing pieces of information, you just call the
.format method on that string.
Strings that use the method .format for formatting are typically characterised by the occurrence of a series
of curly braces "{}" within the string. It is also common to find that the method .format is called where/when
the string literal is defined.
Value conversion
When we do string formatting, the objects that we want to format into the template string need to be converted
to a string.
This is typically done by calling str on the objects, which in turn calls the dunder method __str__ of those
objects. However, sometimes it is beneficial to have the object be represented with the result from calling
This book is a WIP. Check online to get updates for free. 334
repr, and not str. (I wrote about why you would want this before, so read this Pydon’t if you are not familiar
with how __str__/__repr__ works.)
There are special ways to determine which type of string conversion happens.
Take this dummy class:
class Data:
def __str__(self):
return "str"
def __repr__(self):
return "repr"
With that class defined, the three following strings are the same:
"%s %r" % (Data(), Data())
f"{Data()!s} {Data()!r}"
Alignment
When we need to format many values across many lines, for example to display a table-like piece of output,
we may want to align all values and pad them accordingly. This is one of the great use cases where string
formatting shines.
lang = "Python"
"%-10s" % lang
"{:<10}".format(lang)
f"{lang:<10}"
This book is a WIP. Check online to get updates for free. 335
C-style formatting can’t do it, but the two modern methods can use ^ to align the output in the centre:
"{:^10}".format(lang)
f"{lang:^10}"
Named placeholders
For longer strings, or strings with many slots to be filled in, it may be helpful to include placeholder strings,
instead of just the symbol to denote string formatting. With f-strings, this happens more or less automatically,
but C-style formatting and .format also support that:
name, age = "John", 73
This book is a WIP. Check online to get updates for free. 336
The first usage of the string method .format shows an interesting feature that formatting with .format
allows: the formatted objects can be indexed and they can also have their attributes accessed.
Here is a very convoluted example:
class ConvolutedExample:
values = [{"name": "Charles"}, {42: "William"}]
ce = ConvolutedExample()
Parametrised formatting
Sometimes, you want to do some string formatting, but the exact formatting you do is dynamic: for example,
you might want to print something with variable width, and you’d like for the width to adapt to the longest
element in a sequence.
For example, say you have a list of companies and their countries of origin, and you want that to be aligned:
data = [("Toyota", "Japanese"), ("Ford", "USA")]
"""
Result is
Toyota, Japanese
Ford, USA
"""
The thing is, what if we now include a company with a longer name?
data = [("Toyota", "Japanese"), ("Ford", "USA"), ("Lamborghini", "Italy")]
"""
Result is
Toyota, Japanese
Ford, USA
Lamborghini, Italy
"""
This book is a WIP. Check online to get updates for free. 337
The output is no longer aligned because the word “Lamborghini” does not fit within the specified width of 7.
Therefore, we need to dynamically compute the maximum lengths and use them to create the correct format
specification. This is where parametrising the format specification comes in handy:
data = [("Toyota", "Japanese"), ("Ford", "USA"), ("Lamborghini", "Italy")]
## Compute brand width and country width needed for formatting.
bw = 1 + max(len(brand) for brand, _ in data)
cw = 1 + max(len(country) for _, country in data)
"""
Result is
Toyota, Japanese
Ford, USA
Lamborghini, Italy
"""
Old style formatting only allows parametrisation of the width of the field and the precision used. For the
string method .format and for f-strings, parametrisation can be used with all the format specifier options.
month = "November"
prec = 3
value = 2.7182
f"{month:.{prec}} = {value:.{prec}f}"
Custom formatting
Finally, the string method .format and f-strings allow you to define how your own custom objects should be
formatted, and that happens through the dunder method __format__.
The dunder method __format__ accepts a string (the format specification) and it returns the corresponding
string.
Here is a (silly) example:
class YN:
def __format__(self, format_spec):
return "N" if "n" in format_spec else "Y"
This book is a WIP. Check online to get updates for free. 338
"{:aaabbbccc}".format(YN()) # Result is 'Y'
Examples in code
As the little snippets of code above have shown you, there is hardly any reason to be using the old string
formatting style. Of course, remember that consistency is important, so it might still make sense if you are
maintaining an old code base that uses old-style formatting everywhere.
Otherwise, you are better off using the string method .format and/or f-strings. Now, I will show you some
usage patterns and I will help you figure out what type of string formatting works best in those cases.
Plain formatting
F-strings are very, very good. They are short to type, they have good locality properties (it is easy to see what
is being used to format that specific portion of the string), and they are fast.
For all your plain formatting needs, prefer f-strings over the method .format:
## Some random variables:
name, age, w = "John", 73, 10
## � Prefer...
f"{name!s} {name!r}"
f"{name:<10}"
f"{name} is {age} years old."
f"{name:^{w}}"
Data in a dictionary
If all your formatting data is already in a dictionary, then using the string method .format might be the best
way to go.
This is especially true if the keys of said dictionary are strings. When that is the case, using the string method
.format almost looks like using f-strings! Except that, when the data is in a dictionary, using f-strings is much
more verbose when compared to the usage of ** in .format:
This book is a WIP. Check online to get updates for free. 339
data = {"name": "John", "age": 73}
## This is nice:
"{name} is {age} years old.".format(**data)
## This is cumbersome:
f"{data['name']} is {data['age']} years old."
In the example above, we see that the .format example exhibits the usual locality that f-strings tend to
benefit from!
Deferred formatting
If you need to create your formatting string first, and only format it later, then you cannot use f-strings.
When that is the case, using the method .format is probably the best way to go.
This type of scenario might arise, for example, from programs that run in (many) different languages:
def get_greeting(language):
if language == "pt":
return "Olá, {}!"
else:
return "Hello, {}!"
Conclusion
Here’s the main takeaway of this Pydon’t, for you, on a silver platter:
“Don’t use old-style string formatting: use f-strings whenever possible, and then .format in the
other occasions.”
This Pydon’t showed you that:
• Python has three built-in types of string formatting;
• using .format and/or f-strings is preferred over %-formatting;
• you can use !s and !r to specify which type of string representation to use;
• alignment can be done with the <^> specifiers;
• format specifications can be parametrised with an extra level of {};
• custom formatting can be implemented via the dunder method __format__;
• f-strings are very suitable for most standard formatting tasks;
• the method .format is useful when the formatting data is inside a dictionary; and
• for deferred string formatting, f-strings don’t work, meaning .format is the recommended string format-
ting method.
This book is a WIP. Check online to get updates for free. 340
If you liked this Pydon’t be sure to leave a reaction below and share this with your friends and fellow Pythonis-
tas. Also, subscribe to the newsletter so you don’t miss a single Pydon’t!
References
• Python 2 Docs, String Formatting Operations, https://docs.python.org/2/library/stdtypes.html#string-
formatting [last accessed 10-11-2021]
• Python 3 Docs, The Python Tutorial, Fancier Output Formatting, https://docs.python.org/3/tutorial/inp
utoutput.html [last accessed 10-11-2021]
• Python 3 Docs, The Python Standard Library, string, https://docs.python.org/3/library/string.html#str
ing-formatting [last accessed 10-11-2021]
• PEP 3101 – Advanced String Formatting, https://www.python.org/dev/peps/pep-3101/ [last accessed
10-11-2021]
• PEP 498 – Literal String Interpolation, https://www.python.org/dev/peps/pep-0498/ [last accessed
10-11-2021]
• PyFormat, https://pyformat.info [last accessed 17-11-2021]
This book is a WIP. Check online to get updates for free. 341
Closing thoughts
I would like to thank you for investing your time improving your Python knowledge through this book and I
invite you to let me know of your feedback.
All criticism that you might have, positive and negative, is welcome and will be read by me. Just drop me a
line at [email protected] or reach out to me on Twitter where I go by the name mathsppblog.
I hope to talk to you soon!
— Rodrigo, https://mathspp.com
342