Hypermodern Python Tooling
Hypermodern Python Tooling
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
Claudio Jolowicz
Hypermodern Python Tooling
by Claudio Jolowicz
The views expressed in this work are those of the author, and do not
represent the publisher’s views. While the publisher and the author have
used good faith efforts to ensure that the information and instructions
contained in this work are accurate, the publisher and the author disclaim all
responsibility for errors or omissions, including without limitation
responsibility for damages resulting from the use of or reliance on this
work. Use of the information and instructions contained in this work is at
your own risk. If any code samples or other technology this work contains
or describes is subject to open source licenses or the intellectual property
rights of others, it is your responsibility to ensure that your use thereof
complies with such licenses and/or rights.
978-1-098-13958-2
Dedication
To Marianna
Preface
This book is a guide to modern Python developer tools—the programs that
help you perform tasks such as:
You don’t strictly need these tools to write Python software. Fire up your
system’s Python interpreter and get an interactive prompt. Save your Python
code as a script for later. Why use anything beyond an editor and a shell?
This is not a rhetorical question. Every tool you add to your development
workflow should have a clear purpose and bring benefits that outweigh the
costs of using it. Generally, the benefits of development tooling become
manifest when you need to make development sustainable over time. At
some point, publishing your module on the Python Package Index will be
easier than emailing it to your users.
This book will show you how developer tooling can help with such
challenges. The tools described here greatly benefit the code quality,
security, and maintainability of Python projects.
But tooling also adds complexity and overhead. The book strives to
minimize that by forging tools into an easy-to-use toolchain, and by
automating workflows reliably and repeatably—whether they execute
locally on a developer machine, or on a continuous integration server across
a range of platforms and environments. As much as possible, you should be
able to focus your attention on writing software, with your toolchain
working in the background.
1
Laziness has been called “a programmer’s greatest virtue,” and this saying
applies to development tooling, too: Keep your workflow simple and don’t
adopt tools for their own sake. At the same time, good programmers are
also curious. Give the tools in this book a try to see what value they may
bring to your projects.
You’re proficient with Python, but you’re not sure how to create a
package.
You’ve been doing this for years—setuptools, virtualenv, and pip are
your friends. You’re curious about recent developments in tooling and
what they bring to the table.
You maintain mission-critical code that runs in production. But there
must be a better way to do all of this. You want to learn about state-of-
the art tools and evolving best practices.
You want to be more productive as a Python developer.
You’re an open source maintainer looking for a robust and modern
project infrastructure.
You’re using a bunch of Python tools in your projects, but it’s hard to
see how everything fits together. You want to reduce the friction that
comes with all this tooling.
“Things just keep breaking—why doesn’t Python find my module
now? Why can’t I import the package I just installed?”
This book assumes that you have a basic knowledge of the Python
programming language. The only tooling you need to be familiar with are
the Python interpreter, an editor or IDE, and the command line of your
operating system.
Italic
Indicates new terms, URLs, email addresses, filenames, and file
extensions.
Constant width
TIP
NOTE
This book is here to help you get your job done. In general, if example code
is offered with this book, you may use it in your programs and
documentation. You do not need to contact us for permission unless you’re
reproducing a significant portion of the code. For example, writing a
program that uses several chunks of code from this book does not require
permission. Selling or distributing examples from O’Reilly books does
require permission. Answering a question by citing this book and quoting
example code does not require permission. Incorporating a significant
amount of example code from this book into your product’s documentation
does require permission.
If you feel your use of code examples falls outside fair use or the
permission given above, feel free to contact us at [email protected].
NOTE
For more than 40 years, O’Reilly Media has provided technology and business training, knowledge,
and insight to help companies succeed.
Our unique network of experts and innovators share their knowledge and
expertise through books, articles, and our online learning platform.
O’Reilly’s online learning platform gives you on-demand access to live
training courses, in-depth learning paths, interactive coding environments,
and a vast collection of text and video from O’Reilly and 200+ other
publishers. For more information, visit https://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the
publisher:
O’Reilly Media, Inc.
Sebastopol, CA 95472
707-829-0104 (fax)
https://www.oreilly.com/about/contact.html
We have a web page for this book, where we list errata, examples, and any
additional information. You can access this page at
https://oreil.ly/hypermodern-python-tooling.
For news and information about our books and courses, visit
https://oreilly.com.
Acknowledgments
This book covers many open source Python projects. I am very grateful to
their authors and maintainers, most of whom work on them in their free
time, often over many years. In particular, I would like to acknowledge the
unsung heroes of the PyPA, whose work on packaging standards lets the
ecosystem evolve towards better tooling. Special thanks to Thea Flowers
for writing Nox and building a welcoming community.
Before this book, there was the Hypermodern Python article series. I would
like to thank Brian Okken, Michael Kennedy, and Paul Everitt for spreading
the word, and Brian for giving me the courage to turn it into a book.
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the first chapter of the final book. Please note that the GitHub
repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
If you’ve picked up this book, you likely have Python installed on your
machine already. Most common operating systems ship with a python3
command. This can be the interpreter used by the system itself; on
Windows and macOS, it’s a placeholder that installs Python for you when
you invoke it for the first time.
Why dedicate an entire chapter to the topic if it’s so easy to get Python onto
a new machine? The answer is that installing Python for long-term
development can be a complex matter, and there are several reasons for this:
You generally need multiple versions of Python installed side-by-side.
(If you’re wondering why, we’ll get to that shortly.)
There are a few different ways to install Python across the common
platforms, each with unique advantages, tradeoffs, and sometimes
pitfalls.
Python is a moving target: You need to keep existing installations up-
to-date with the latest maintenance release, add installations when a
new feature version is published, and remove versions that are no
longer supported. You may even need to test a prerelease of the next
Python.
You may want your code to run on multiple platforms. While Python
makes it easy to write portable programs, setting up a developer
environment requires some familiarity with the idiosyncrasies of each
platform.
You may want to run your code with an alternative implementation of
1
Python.
In this first chapter, I’ll show you how to install multiple Python versions
on some of the major operating systems in a sustainable way, and how to
keep your little snake farm in good shape.
TIP
Even if you only develop for a single platform, I’d encourage you to learn about working with
Python on other operating systems. It’s fun—and familiarity with other platforms enables you to
provide a better experience to the contributors and users of your software.
For these reasons, it’s common to support both current and past versions of
Python until their official end-of-life date, and to set up installations for
them side-by-side on a developer machine. With new feature versions
coming out every year and support extending over five years, this gives you
a testing matrix of five actively supported versions (see Figure 1-1). If that
sounds like a lot of work, don’t worry: the Python ecosystem comes with
tooling that makes this a breeze.
THE PYTHON RELEASE CYCLE
Python has an annual release cycle: feature releases happen every October.
Each feature release gets a new minor version in Python’s
major.minor.micro scheme. By contrast, new major versions are
rare and reserved for strongly incompatible changes—as I write this in early
2024, a Python 4 is not in sight. Python’s backward compatibility policy
allows incompatible changes in minor releases when preceded by a two-
year deprecation period.
Feature versions are maintained for five years, after which they reach end-
of-life. Bugfix releases for a feature version occur roughly every other
3
month during the first 18 months after its initial release. This is followed
by security updates whenever necessary during the remainder of the five-
year support period. Each maintenance release bumps the micro version.
On Windows, PATH -based interpreter discovery is less relevant because Python installations can be
located via the Windows Registry (see “The Python Launcher for Windows”). Windows installers
only ship an unversioned python.exe executable.
Figure 1-2. A developer system with multiple Python installations. The search path is displayed as a
stack of directories; commands at the top shadow those further down.
The order of directories on the search path matters because earlier entries
take precedence over, or “shadow”, later ones. In Figure 1-2, python3
refers the current stable version (Python 3.12). If you omitted the top entry,
python3 would refer to the prerelease (Python 3.13). Without the top
two entries, it would refer to Homebrew’s default interpreter, which is still
on the previous stable version (Python 3.11).
export PATH="/usr/local/opt/python/bin:$PATH"
You’re adding the bin subdirectory instead of the installation root, because
that’s where the interpreter is normally located on these systems. We’ll take
a closer look at the layout of Python installations in Chapter 2. Also, you’re
adding the directory to the front of the PATH variable. I’ll explain shortly
why this is usually what you want.
The line above also works with Zsh, which is the default shell on macOS.
That said, there’s a more idiomatic way to manipulate the search path on
Zsh:
typeset -U path
path=(/usr/local/opt/python/bin $path)
This instructs the shell to remove duplicate entries from the search
path.
The shell keeps the path array synchronized with the PATH
variable.
fish_add_path /usr/local/opt/python/bin
It would be tedious to set up the search path manually at the start of every
shell session. Instead, you can place the commands above in your shell
profile—a file in your home directory that is read by the shell on startup.
Table 1-1 shows the most common ones.
Zsh .zshrc
Fish .config/fish/fish.config
Unless your system already comes with a well-curated and up-to-date selection of interpreters,
prepend Python installations to the PATH environment variable, with the latest stable version at the
very front.
Curiously, the PATH mechanism has remained essentially the same since
the 1970s. In the original design of the Unix operating system, the shell still
looked up commands entered by the user in a directory named /bin. With
the 3rd edition of Unix (1973), this directory—or rather, the 256K drive that
backed it—became too small to hold all available programs. Researchers at
Bell Labs introduced an additional filesystem hierarchy rooted at /usr,
allowing a second disk to be mounted. But now the shell had to search for
programs across multiple directories—/bin and /usr/bin. Eventually, the
Unix designers settled on storing the list of directories in an environment
variable named PATH . Since every process inherits its own copy of the
environment, users can customize their search path without affecting system
processes.
NOTE
Depending on your domain and target environment, you may prefer to use the Windows Subsystem
for Linux (WSL) for Python development. In this case, please refer to the section “Installing Python
on Linux” instead.
Binary installers are only provided up to the last bugfix release of each
Python version, which occurs around 18 months after the initial release.
Security updates for older versions, on the other hand, are provided as
4
source distributions only. If you don’t want to build Python from source,
you can use one of the excellent Python Standalone Builds, a collection of
self-contained, highly portable Python distributions.
When you install a new feature release of Python, there are some additional
steps to be mindful of:
Enable the option to add the new Python to the PATH environment
variable.
Remove the previous Python release from PATH . You can edit the
environment variables for your account using the System Settings tool
that is part of Windows.
You may also wish to reinstall some of your developer tooling, to
ensure that it runs on the latest Python version.
Eventually, a Python version will reach its end of life, and you may wish to
uninstall it to free up resources. You can remove an existing installation
using the Installed Apps tool. Choose the Uninstall action for its entry in the
list of installed software. Beware that removing a Python version will break
projects and tools that are still using it, so you should upgrade those to a
newer Python beforehand.
Windows systems ship with a python stub that redirects the user to the
latest Python package on the Microsoft Store. The Microsoft Store package
is intended mainly for educational purposes, and does not have full write
access to some shared locations on the filesystem and the registry. While
it’s useful for teaching Python to beginners, I would not recommend it for
most intermediate and advanced Python development.
> py
Python 3.12.2 (tags/v3.12.2:6abddd9, Feb 6 2024,
Type "help", "copyright", "credits" or "license"
>>>
By default, the Python Launcher selects the most recent version of Python
installed on the system. It’s worth noting that this may not be the same as
the most recently installed version on the system. This is good—you don’t
want your default Python to change when you install a bugfix release for an
older version.
If you want to launch a specific version of the interpreter, you can pass the
feature version as a command-line option:
> py -3.11
Python 3.11.8 (tags/v3.11.8:db85d51, Feb 6 2024,
Type "help", "copyright", "credits" or "license"
>>>
> py -V
Python 3.12.2
> py -3.11 -V
Python 3.11.8
Using the same mechanism, you can run a script on a specific interpreter:
NOTE
For historical reasons, py also inspects the first line of the script to see if a version is specified
there. The canonical form is #!/usr/bin/env python3 , which corresponds to py -3 and
works across all major platforms.
As you have seen, the Python Launcher defaults to the newest version on
the system. There is an exception to this rule: if a virtual environment is
5
active, py defaults to the interpreter in the virtual environment.
When you install a prerelease of Python, the Python Launcher will use it as
the default interpreter instead of the current release—after all, it’s the
newest version on the system. In this case, you should override the default
by setting the PY_PYTHON and PY_PYTHON3 environment variables to
the current release:
Restart the console for the setting to take effect. Don’t forget to remove
these variables once you upgrade from the prerelease to the final release.
To conclude our short tour of the Python Launcher, use the command py
--list to enumerate the interpreters on your system:
> py --list
-V:3.13 Python 3.13 (64-bit)
-V:3.12 * Python 3.12 (64-bit)
-V:3.11 Python 3.11 (64-bit)
-V:3.10 Python 3.10 (64-bit)
-V:3.9 Python 3.9 (64-bit)
-V:3.8 Python 3.8 (64-bit)
In this listing, the asterisk marks the default version of Python.
TIP
Even if you always use the Python Launcher yourself, you should still keep your PATH up-to-date.
Some third-party tools run the python.exe command directly—you don’t want them to use an
outdated Python version or fall back to the Microsoft Store shim.
NOTE
Whenever you see names like python3.x or [email protected] in this section, replace 3.x with
the actual feature version. For example, use python3.12 and [email protected] for Python 3.12.
You may find that you already have some Python versions installed for
other Homebrew packages that depend on them. Nonetheless, it’s important
that you install every version explicitly. Automatically installed packages
may get deleted when you run brew autoremove to clean up
resources.
Next, prepend the bin directory from this installation to your PATH . Here’s
an example that works on the Bash shell:
export PATH="/opt/homebrew/opt/[email protected]/bin:$P
On the other hand, Homebrew Python also comes with some limitations:
TIP
Personally, I recommend Homebrew for managing Python on macOS—it’s well-integrated with the
rest of the system and easy to keep up-to-date. Use the python.org installers to test your code against
prereleases, which are not available from Homebrew.
The core Python team provides official binary installers in the Downloads
for macOS section of the Python website. Download the 64-bit universal2
installer for the release you wish to install. The universal2 binaries of the
7
interpreter run natively on both Apple Silicon and Intel chips.
After installing a Python version, run the Install Certificates command located in the
/Applications/Python 3.x/ folder. This command installs Mozilla’s curated collection of root
certificates, which are required to establish secure internet connections from Python.
When you install a bugfix release for a Python version that is already
present on the system, it will replace the existing installation. You can
uninstall a Python version by removing these two directories:
/Library/Frameworks/Python.framework/Versions/3.x/
/Applications/Python 3.x/
The system Python in a Linux distribution may be quite old, and not all
distributions include alternate Python versions in their main package
repositories.
Linux distributions have mandatory rules about how applications and
libraries may be packaged. For example, Debian’s Python Policy
mandates that the standard ensurepip module must be shipped in a
separate package; as a result, you can’t create virtual environments on
a default Debian system (a situation commonly fixed by installing the
python3-full package).
The main Python package in a Linux distribution serves as the
foundation for other packages that require a Python interpreter. These
packages may include critical parts of the system, such as Fedora’s
package manager DNF. Distributions therefore apply safeguards to
protect the integrity of the system; for example, most distributions
prevent you from installing or uninstalling packages system-wide
using pip.
In the next sections, I’ll take a look at installing Python on two major Linux
distributions, Fedora and Ubuntu. Afterwards, I’ll cover some generic
installation methods that don’t use the official package manager:
Homebrew, Nix, Pyenv, and Conda. I’ll also introduce you to the Python
Launcher for Unix, a third-party package that aims to bring the py utility
to Linux, macOS, and similar systems.
Fedora Linux
Python comes pre-installed on Fedora, and you can install additional Python
versions using DNF:
Fedora has packages for all active feature versions and prereleases of
CPython, the reference implementation of Python, as well as packages with
alternative implementations like PyPy. A convenient shorthand to install all
of these at once is to install the tox package:
In case you’re wondering, tox is a test automation tool that makes it easy to
run a test suite against multiple versions of Python; its Fedora package pulls
in most available interpreters as recommended dependencies. Tox is also
the spiritual ancestor of Nox, the subject of Chapter 8.
Ubuntu Linux
You can now install Python versions using the APT package manager:
TIP
Always remember to include the -full suffix when installing Python on Debian and Ubuntu. The
python3.x-full packages pull in the entire standard library and up-to-date root certificates. In
particular, they ensure that you can create virtual environments.
Another fascinating option for both macOS and Linux is Nix, a purely
functional package manager with reproducible builds of thousands of
software packages. Nix makes it easy and fast to set up isolated
environments with arbitrary versions of software packages. Here’s how you
would set up a development environment with two Python versions:
You can install the Nix package manager using its official installer. If
you’re not ready to install Nix permanently, you can get a taste of what’s
possible using the Docker image for NixOS, a Linux distribution built
entirely using Nix:
Below is an example session using the macOS system from Figure 1-2.
(Python 3.13 was a prerelease at the time of writing this, so I’ve changed
the default interpreter by setting PY_PYTHON and PY_PYTHON3 to
3.12 .)
$ py -V
3.12.1
$ py -3.11 -V
3.11.7
$ py --list
3.13 │ /Library/Frameworks/Python.framework/Vers
3.12 │ /opt/homebrew/bin/python3.12
3.11 │ /opt/homebrew/bin/python3.11
3.10 │ /opt/homebrew/bin/python3.10
You can run many third-party tools by passing their import name to the -m
interpreter option. Suppose you have installed pytest (a test framework) on
multiple Python versions. Using py -m pytest lets you determine
which interpreter should run the tool. By contrast, a bare pytest uses the
command that happens to appear first on your PATH .
If you invoke py with a Python script but don’t specify a version, py
inspects the first line of the script for a shebang—a line specifying the
interpreter for the script. Stick with the canonical form here:
#!/usr/bin/env python3 . Entry-point scripts are a more
sustainable way to link a script to a specific interpreter, because package
installers can generate the correct interpreter path during installation (see
“Entry-point scripts”).
WARNING
For compatibility with the Windows version, the Python Launcher only uses the Python version from
shebangs, not the full interpreter path. As a result, you can end up with a different interpreter than if
you were to invoke the script directly without py .
In this section, you’ll use Pyenv as a build tool. If you’re interested in using Pyenv as a version
manager, please refer to the official documentation for additional setup steps. I’ll discuss some of the
trade-offs in “Managing Python Versions with Pyenv”.
The best way to install Pyenv on macOS and Linux is using Homebrew:
One great benefit of installing Pyenv from Homebrew is that you’ll also get
the build dependencies of Python. If you use a different installation method,
check the Pyenv wiki for platform-specific instructions on how to set up
your build environment.
The list of interpreters is impressive. Not only does it cover all active
feature versions of Python, it also includes prereleases, unreleased
development versions, almost every point release published over the past
two decades, and a wealth of alternative implementations, such as PyPy,
GraalPy, MicroPython, Jython, IronPython, and Stackless Python.
You can build and install any of these versions by passing them to pyenv
install :
When using Pyenv as a mere build tool, as we’re doing here, you need to
add each installation to PATH manually. You can find its location using the
command pyenv prefix 3.x.y and append /bin to that. Here’s an
example for the Bash shell:
export PATH="$HOME/.pyenv/versions/3.x.y/bin:$PAT
When you no longer need an installation, you can remove it like this:
$ export PYTHON_CONFIGURE_OPTS='--enable-optimiza
If the practical advantages of the shim mechanism convince you, you may
also like asdf, a generic version manager for multiple language runtimes; its
Python plugin uses python-build internally. If you like per-directory
version management, but don’t like shims, take a look at direnv, which can
update your PATH whenever you enter a directory. (It can even create and
activate virtual environments for you.)
Installing Python from Anaconda
Anaconda is an open source software distribution for scientific computing,
maintained by Anaconda Inc. Its centerpiece is Conda, a cross-platform
package manager for Windows, macOS, and Linux. Conda packages can
contain software written in any language, such as C, C++, Python, R, or
Fortran.
In this section, you’ll use Conda to install Python. Conda does not install
software packages globally on your system. Each Python installation is
contained in a Conda environment and isolated from the rest of your
system. A typical Conda environment is centered around the dependencies
of a particular project—say, a set of libraries for machine learning or data
science—of which Python is only one among many.
Before you can create Conda environments, you’ll need to bootstrap a base
environment containing Conda itself. There are a few ways to go about this:
You can install the full Anaconda distribution, or you can use the
Miniconda installer with just Conda and a few core packages. Both
Anaconda and Miniconda download packages from the defaults channel,
which may require a commercial license for enterprise use.
Conda requires shell integration to update the search path and shell prompt
when you activate or deactivate an environment. If you’ve installed
Miniforge from Homebrew, update your shell profile using the conda
init command with the name of your shell. For example:
The Windows installer does not activate the base environment globally.
Interact with Conda using the Miniforge Prompt from the Windows Start
Menu.
Before you can use this Python installation, you need to activate the
environment:
This command will run in the active Conda environment. What’s great
about Conda is that it won’t upgrade Python to a release that’s not yet
supported by the Python libraries in the environment.
$ conda deactivate
Hatch lets you install all CPython and PyPy interpreters compatible with
your platform with a single command:
10
This command also adds the installation directories to your PATH . Re-
run the command with the --update option to upgrade the interpreters
to newer versions. Hatch organizes interpreters by feature version, so patch
releases overwrite the existing installation.
Rye fetches interpreters into the ~/.rye/py directory. Normally, this happens
behind the scenes when you synchronize the dependencies of your project.
But it’s also available as a dedicated command:
An Overview of Installers
Figure 1-3 provides an overview of the main Python installation methods
for Windows, Linux, and macOS.
Figure 1-3. Python installers for Windows, Linux, and macOS
The next chapter zooms into a Python installation: its contents and
structure, and how your code interacts with it. You’ll also learn about its
lightweight cousins, virtual environments, and the tooling that has evolved
around those.
1
While CPython is the reference implementation of Python, there are quite a few more to choose
from: performance-oriented forks such as PyPy and Cinder, re-implementations such as RustPython
and MicroPython, and ports to other platforms like WebAssembly, Java, and .NET.
2
At the time of writing in early 2024, the long-term support release of Debian Linux ships patched
versions of Python 2.7.16 and 3.7.3—both released half a decade ago. (Debian’s “testing”
distribution, which is widely used for development, comes with a current version of Python.)
3
Starting with Python 3.13, bugfix releases are provided for two years after the initial release.
4
Stack Overflow has a good step-by-step guide to building Windows installers.
5
“Virtual Environments” covers virtual environments in detail. For now, you can think of a virtual
environment as a shallow copy of a full Python installation that lets you install a separate set of third-
party packages.
6
Justin Mayer: “Homebrew Python Is Not For You,” February 3rd, 2021.
7
Do you have a Mac with Apple Silicon, but programs that must run on Intel processors? You’ll be
pleased to know that the python.org installers also provide a python3-intel64 binary using the
x64_64 instruction set. You can run it on Apple Silicon thanks to Apple’s Rosetta translation
environment.
8
The Unix command-line tools option places symbolic links in the /usr/local/bin directory, which can
conflict with Homebrew packages and other versions from python.org. A symbolic link is a special
kind of file that points to another file, much like a shortcut in Windows.
9
For historical reasons, framework builds use a different path for the per-user site directory, the
location where packages are installed if you invoke pip outside of a virtual environment and without
administrative privileges. This different installation layout can prevent you from importing a
previously installed package.
0
In a future release, Hatch will add interpreters to the Windows registry as well, letting you use them
with the Python Launcher.
Chapter 2. Python Environments
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the second chapter of the final book. Please note that the
GitHub repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
NOTE
This book uses Python environment as an umbrella term that includes both system-wide installations
and virtual environments. Beware that some people only use the term for project-specific
environments, like virtual environments or Conda environments.
Figure 2-1. Python environments consist of an interpreter and modules. Virtual environments share
the interpreter and the standard library with their parent environment.
NOTE
This chapter uses the Python Launcher to invoke the interpreter (see “The Python Launcher for
Windows” and “The Python Launcher for Unix”). If you don’t have it installed, replace py with
python3 when running the examples.
$ py hello.py
Alternatively, you can pass a module with the -m option, provided that the
interpreter can import the module:
$ py -m hello
$ hello
This method is convenient, but there’s also a drawback: If you’ve installed
the program in multiple environments, the first environment on PATH
“wins”. In such a scenario, the form py -m hello offers you more
control.
For this reason, the canonical way to install a package with pip uses the -
m form:
The second method has the advantage of not requiring pip in every
environment.
Python Installations
This section takes you on a tour of a Python installation. Feel free to follow
along on your own system. Table 2-1 shows the most common locations—
replace 3.x and 3x with the Python feature version, such as 3.12 and
312 .
Table 2-1. Locations of Python installations
Windows %LocalAppData%\Programs\Python\Python3x
(single-user)
Windows %ProgramFiles%\Python3x
(multi-user)
macOS /opt/homebrew/Frameworks/Python.framework/Versions/3
(Homebrew)
macOS /Library/Frameworks/Python.framework/Versions/3.x
(python.org)
Linux /usr/local
(generic)
Linux /usr
(package
manager)
a
Homebrew on macOS Intel uses /usr/local instead of /opt/homebrew.
An installation might be cleanly separated from the rest of your system, but
not necessarily. On Linux, it goes into a shared location like /usr or
/usr/local, with its files scattered across the filesystem. Windows systems,
on the other hand, keep all files in a single place. Framework builds on
macOS are similarly self-contained, although distributions may also install
symbolic links into the traditional Unix locations.
In the following sections, you’ll take a closer look at the core parts of
Python installations—the interpreter and the modules, as well as some other
components such as entry-point scripts and shared libraries.
The layout of Python installations varies quite a bit from system to system.
The good news is, you rarely have to care—a Python interpreter knows its
environment. For reference, Table 2-2 provides a baseline for installation
layouts on the major platforms. All paths are relative to the installation root.
Table 2-2. Layout of Python installations
Linux and
Files Windows Notes
macOS
The interpreter
In an interactive session, import the sys module and inspect the following
variables:
sys.version_info
sys.implementation.name
sys.implementation.version
sys.executable
sys.prefix
sys.path
Python modules
Modules are containers of Python objects that you load via the import
statement. They’re organized below Lib (Windows) or lib/python3.x (Linux
and macOS) with some platform-dependent variations. Third-party
packages go into a subdirectory named site-packages.
Modules come in various forms and shapes. If you’ve worked with Python,
you’ve likely used most of them already. Let’s go over the different kinds:
Simple modules
Packages
Namespace packages
Extension modules
Built-in modules
Some modules from the standard library, such as the sys and
builtins modules, are compiled into the interpreter. The variable
sys.builtin_module_names lists all of these modules.
Frozen modules
Some modules from the standard library are written in Python but
have their bytecode embedded in the interpreter. Originally, only
core parts of importlib got this treatment. Recent versions of
Python freeze every module that’s imported during interpreter
startup, such as os and io .
NOTE
The term package carries some ambiguity in the Python world. It refers both to modules and to the
artifacts used for distributing modules, also known as distributions. Unless stated otherwise, this
book uses package as a synonym for distribution.
You can find out where a module comes from using importlib from the
standard library. Every module has an associated ModuleSpec object
whose origin attribute contains the location of the source file or
dynamic library for the module, or a fixed string like "built-in" or
"frozen" . The cached attribute stores the location of the bytecode
for a pure Python module. Example 2-1 shows the origin of each module in
the standard library.
import importlib.util
import sys
distributions = importlib.metadata.distributions
for distribution in sorted(distributions, key=lam
print(f"{distribution.name:30} {distribution
Entry-point scripts
This mechanism has two key benefits. First, you can launch the application
in a shell by running a simple command—say, pydoc3 for Python’s built-
3
in documentation browser. Second, entry-point scripts use the interpreter
and modules from their environment, sparing you surprises about wrong
Python versions or missing third-party packages.
Package installers, like pip, can generate entry-point scripts for third-party
packages they install. Package authors only need to designate the function
that the script should invoke. This is a handy method to provide an
executable for a Python application (see “Entry-point Scripts”).
Platforms differ in how they let you execute entry-point scripts directly. On
Linux and macOS, they’re regular Python files with execute permission,
such as the one shown in Example 2-3. Windows embeds the Python code
in a binary file in the Portable Executable (PE) format—more commonly
known as a .exe file. The binary launches the interpreter with the embedded
code.
#!/usr/local/bin/python3.12
import pydoc
if __name__ == "__main__":
pydoc.cli()
The #! line is known as a shebang on Unix-like operating systems. When you run the script, the
program loader uses the line to locate and launch the interpreter. The program loader is the part of
the operating system that loads a program into main memory.
Other components
Shared libraries
Header Files
Tcl/Tk
a
Linux ~/.local/lib/python3.x/site-packages ~/.local/bin
a
Fedora places extension modules under lib64.
TIP
The per-user script directory may not be on PATH by default. If you install applications into the per-
user environment, remember to edit your shell profile to update the search path. Pip issues a friendly
reminder when it detects this situation.
Per-user environments have an important shortcoming: by design, they’re
not isolated from the global environment. You can still import system-wide
site packages if they’re not shadowed by per-user modules with the same
name. Applications in the per-user environment also aren’t isolated from
each other—in particular, they can’t depend on incompatible versions of
another package. Even applications in the system-wide environment can
import modules from the per-user environment.
And there’s another drawback: you can’t install packages into the per-user
environment if the Python installation is marked as externally managed—
for example, if you installed Python using the package manager of your
distribution.
In “Installing Applications with Pipx”, I’ll introduce pipx, which lets you
install applications in isolated environments. It uses the per-user script
directory to put applications onto your search path, but relies on virtual
environments under the hood.
Virtual Environments
a
Fedora places third-party extension modules under lib64 instead of lib.
Installing packages
7
Virtual environments include pip as a means to install packages into them.
Let’s create a virtual environment, install httpx (an HTTP client library),
and launch an interactive session. On Windows, enter the commands below.
> py -m venv .venv
> .venv\Scripts\python -m pip install httpx
> .venv\Scripts\python
On Linux and macOS, enter the commands below. There’s no need to spell
out the path to the interpreter if the environment uses the well-known name
.venv. The Python Launcher for Unix selects its interpreter by default.
$ py -m venv .venv
$ py -m pip install httpx
$ py
Virtual environments come with the version of pip that was current when
Python was released. This can be a problem when you’re working with an
old Python release. Create the environment with the option --upgrade-
deps to ensure you get the latest pip release from the Python Package
Index.
You can also create a virtual environment without pip using the option --
without-pip and install packages with an external installer. If you have
pip installed globally, you can pass the target environment using its --
python option, like this:
Activation scripts
They prepend the script directory to the PATH variable. This allows
you to invoke python , pip , and entry-point scripts without
prefixing them with the path to the environment.
They set the VIRTUAL_ENV environment variable to the location of
the virtual environment. Tools like the Python Launcher use this
variable to detect that the environment is active.
They update your shell prompt to provide a visual reference which
environment is active, if any. By default, the prompt uses the name of
the directory where the environment is located.
TIP
You can provide a custom prompt using the option --prompt when creating the environment. The
special value . designates the current directory; it’s particularly useful when you’re inside a project
repository.
On macOS and Linux, you need to source the activation script to allow it to
affect your current shell session. Here’s an example for Bash and similar
shells:
$ source .venv/bin/activate
Environments come with activation scripts for some other shells, as well.
For example, if you use the Fish shell, source the supplied activate.fish
script instead.
> .venv\Scripts\activate
$ deactivate
How does Python know to import a third-party package like httpx from
the virtual environment instead of the Python installation? The location
can’t be hardcoded in the interpreter binary, given that virtual environments
share the interpreter with the Python installation. Instead, Python looks at
the location of the python command you used to launch the interpreter.
If its parent directory contains a pyvenv.cfg file, Python treats that file as a
landmark for a virtual environment and imports third-party modules from
the site-packages directory beneath.
This explains how you import third-party modules from the virtual
environment, but how does Python find modules from the standard library?
After all, they’re neither copied nor linked into the virtual environment.
Again, the answer lies in the pyvenv.cfg file: When you create a virtual
environment, the interpreter records its own location under the home key
in this file. If it later finds itself in a virtual environment, it looks for the
standard library relative to that home directory.
NOTE
The name pyvenv.cfg is a remnant of the pyvenv script which used to ship with Python. The py -
m venv form makes it clearer which interpreter you use to create the virtual environment—and
thus which interpreter the environment itself will use.
While the virtual environment has access to the standard library in the
system-wide environment, it’s isolated from its third-party modules.
(Although not recommended, you can give the environment access to those
modules as well, using the --system-site-packages option when
creating the environment.)
How does pip know where to install packages? The short answer is that pip
asks the interpreter it’s running on, and the interpreter derives the location
8
from its own path—just like when you import a module. This is why it’s
best to run pip with an explicit interpreter using the py -m pip idiom. If
you invoke pip directly, the system searches your PATH and may come
up with the entry-point script from a different environment.
That’s precisely what pipx does, and it leverages a simple idea to make it
possible: it copies or symlinks the entry-point script for the application from
its virtual environment into a directory on your search path. Entry-point
scripts contain the full path to the environment’s interpreter, so you can
copy them anywhere you want, and they’ll still work.
Pipx in a Nutshell
Let me show you how this works in a nutshell—the commands below are
for Linux or macOS. First, you create a shared directory for the entry-point
scripts of your applications and add it to your PATH environment variable:
$ mkdir -p ~/.local/bin
$ export PATH="$HOME/.local/bin:$PATH"
Next, you install an application in a dedicated virtual environment—I’ve
chosen the Black code formatter as an example:
$ py -m venv black
$ black/bin/python -m pip install black
Finally, you copy the entry-point script into the directory you created in the
first step—that would be a script named black in the bin directory of the
environment:
$ cp black/bin/black ~/.local/bin
Now you can invoke black even though the virtual environment is not
active:
$ black --version
black, 24.2.0 (compiled: yes)
Python (CPython) 3.12.2
On top of this simple idea, the pipx project has built a cross-platform
package manager for Python applications with a great developer
experience.
TIP
If there’s a single Python application that you should install on a development machine, pipx is
probably it. It lets you install, run, and manage all the other Python applications in a way that’s
convenient and avoids trouble.
Installing Pipx
$ pipx ensurepath
The second step also puts the pipx command itself on your search path.
If you don’t already have shell completion for pipx, activate it by following
the instructions for your shell, which you can print with this command:
$ pipx completions
With pipx installed on your system, you can use it to install and manage
applications from the Python Package Index (PyPI). For example, here’s
how you would install Black with pipx:
You can also use pipx to upgrade an application to a new release, reinstall
it, or uninstall it from your system:
$ pipx upgrade-all
$ pipx reinstall-all
$ pipx uninstall-all
$ pipx list
If you find yourself in this situation, provide the PyPI name using the --
spec option, like this:
TIP
Use pipx run <app> as the default method to install and run developer tools from PyPI. Use
pipx install <app> if you need more control over application environments, for example if
you need to install plugins. (Replace <app> with the name of the app.)
Configuring Pipx
By default, pipx installs applications on the same Python version that it runs
on itself. This may not be the latest stable version, particularly if you
installed pipx using a system package manager like APT. I recommend
setting the environment variable PIPX_DEFAULT_PYTHON to the latest
stable Python if that’s the case. Many developer tools you run with pipx
create their own virtual environments; for example, virtualenv, Nox, tox,
Poetry, and Hatch all do. It’s worthwhile to ensure that all downstream
environments use a recent Python version by default.
Under the hood, pipx uses pip as a package installer. This means that any
configuration you have for pip also carries over to pipx. A common use
case is installing Python packages from a private index instead of PyPI,
such as a company-wide package repository.
You can use pip config to set the URL of your preferred package
index persistently:
Alternatively, you can set the package index for the current shell session
only. Most pip options are also available as environment variables:
$ export PIP_INDEX_URL=https://example.com
Both methods cause pipx to install applications from the specified index.
Managing Environments with uv
The tool uv is a drop-in replacement for core Python packaging tools,
written in the Rust programming language. It offers order-of-magnitude
performance improvements over the Python tools it replaces, in a single
static binary without dependencies. While its uv venv and uv pip
subcommands aim for compatibility with virtualenv and pip, uv also
embraces evolving best practices, such as operating in a virtual environment
by default.
$ pipx install uv
$ uv venv
Specify the interpreter for the virtual environment using the --python
option with a specification like 3.12 or python3.12 ; a full path to an
interpreter also works. Uv discovers available interpreters by scanning your
PATH . On Windows, it also inspects the output of py --list-paths .
If you don’t specify an interpreter, uv defaults to python3 on Linux and
macOS, and python.exe on Windows.
NOTE
Despite its name, uv venv emulates the Python tool virtualenv, not the built-in venv module.
Virtualenv creates environments with any Python interpreter on your system. It combines interpreter
discovery with aggressive caching to make this fast and flawless.
This section takes a deep dive into the other mechanism that links programs
to an environment: module import, which is the process of locating and
loading Python modules for a program.
TIP
In a nutshell, just like the shell searches PATH for executables, Python searches sys.path for
modules. This variable holds a list of locations from where Python can load modules—most
commonly, directories on the local filesystem.
Having the import system in the standard library lets you inspect and
customize the import mechanism from within Python. For example, the
import system supports loading modules from directories and from zip
archives out of the box. But entries on sys.path can be anything really
—say, a URL or a database query—as long as you register a function in
sys.path_hooks that knows how to find and load modules from these
path entries.
Module Objects
When you import a module, the import system returns a module object, an
object of type types.ModuleType . Any global variable defined by the
imported module becomes an attribute of the module object. This allows
you to access the module variable in dotted notation ( module.var ) from
the importing code.
exec(code, module.__dict__)
Additionally, module objects have some special attributes. For instance, the
__name__ attribute holds the fully-qualified name of the module, like
email.message . The __spec__ module holds the module spec,
which I’ll talk about shortly. Packages also have a __path__ attribute,
which contains locations to search for submodules.
NOTE
Most commonly, the __path__ attribute of a package contains a single entry: the directory
holding its __init__.py file. Namespace packages, on the other hand, can be distributed across
multiple directories.
When you first import a module, the import system stores the module object
in the sys.modules dictionary, using its fully-qualified name as a key.
Subsequent imports return the module object directly from
sys.modules . This mechanism brings a number of benefits:
Performance
Idempotency
Importing modules can have side effects, for example when
executing module-level statements. Caching modules in
sys.modules ensures that these side effects happen only once.
The import system also uses locks to ensure that multiple threads can
safely import the same module.
Recursion
Module Specs
The module spec is the link between those two steps. A module spec
contains metadata about a module such as its name and location, as well as
an appropriate loader for the module (Table 2-5). You can also access most
of the metadata from the module spec using special attributes directly on
the module object.
Module
Spec Attribute Description
attribute
The import system finds and loads modules using two kinds of objects.
Finders ( importlib.abc.MetaPathFinder ) are responsible for
locating modules given their fully-qualified names. When successful, their
find_spec method returns a module spec with a loader; otherwise, it
returns None . Loaders ( importlib.abc.Loader ) are objects with
an exec_module function which load and execute the module’s code.
The function takes a module object and uses it as a namespace when
executing the module. The finder and loader can be the same object, which
is then known as an importer.
The zip importer works similarly, except that it doesn’t support extension
modules, because current operating systems don’t allow loading dynamic
libraries from a zip archive.
When your program can’t find a specific module, or imports the wrong
version of a module, it can help to take a look at sys.path , the module
path. But where do the entries on sys.path come from in the first
place? Let’s unravel some of the mysteries of the module path.
When the interpreter starts up, it constructs the module path in two steps.
First, it builds an initial module path using some built-in logic. Most
importantly, this initial path includes the standard library. Second, the
interpreter imports the site module from the standard library. The
site module extends the module path to include the site packages from
the current environment.
In this section, we’ll take a look at how the interpreter constructs the initial
module path with the standard library. The next section explains how the
site module appends directories with site packages.
NOTE
You can find the built-in logic for constructing sys.path in Modules/getpath.py in the CPython
source code. Despite appearances, this is not an ordinary module. When you build Python, its code is
frozen to bytecode and embedded in the executable.
The locations on the initial module path fall into three categories, and they
occur in the order given below:
1. The current directory or the directory containing the Python script (if
any)
2. The locations in the PYTHONPATH environment variable (if set)
3. The locations of the standard library
Table 2-6 shows the remaining entries on the initial module path, which are
dedicated to the standard library. Locations are prefixed with the path to the
installation, and may differ in details on some platforms. Notably, Fedora
places the standard library under lib64 instead of lib.
Linux and
Windows Description
macOS
The location of the standard library is not hardcoded in the interpreter (see
“Virtual Environments”). Rather, Python looks for landmark files on the
path to its own executable, and uses them to locate the current environment
( sys.prefix ) and the Python installation ( sys.base_prefix ).
One such landmark file is pyvenv.cfg, which marks a virtual environment
and points to its parent installation via the home key. Another landmark is
os.py, the file containing the standard os module: Python uses os.py to
discover the prefix outside of a virtual environment, and to locate the
standard library itself.
Site Packages
The site module adds the following path entries if they exist on the
filesystem:
Site packages
This directory holds third-party packages from the current
environment, which is either a virtual environment or a system-wide
installation. On Fedora and some other systems, pure Python
modules and extension modules are in separate directories. Many
Linux systems also separate distribution-owned site packages under
/usr from local site packages under /usr/local.
In the general case, the site packages are in a subdirectory of the standard
library named site-packages. If the site module finds a pyvenv.cfg file on
the interpreter path, it uses the same relative path as in a system installation,
but starts from the virtual environment marked by that file. The site
module also modifies sys.prefix to point to the virtual environment.
.pth files
Within site packages directories, any file with a .pth extension can
list additional directories for sys.path , one directory per line.
This works similar to PYTHONPATH , except that modules in these
directories will never shadow the standard library. Additionally, .pth
files can import modules directly—the site module executes any
line starting with import as Python code. Third-party packages
can ship .pth files to configure sys.path in an environment.
Some packaging tools use .pth files behind the scenes to implement
editable installs. An editable install places the source directory of
your project on sys.path , making code changes instantly visible
inside the environment.
If you run the site module as a command, it prints out your current
module path, as well as some information about the per-user environment:
$ py -m site
sys.path = [
'/home/user',
'/usr/local/lib/python312 zip'
'/usr/local/lib/python312.zip',
'/usr/local/lib/python3.12',
'/usr/local/lib/python3.12/lib-dynload',
'/home/user/.local/lib/python3.12/site-packag
'/usr/local/lib/python3.12/site-packages',
]
USER_BASE: '/home/user/.local' (exists)
USER_SITE: '/home/user/.local/lib/python3.12/site
ENABLE_USER_SITE: True
If you’ve read this far, the module path may almost seem a little—
byzantine?
As you’ve seen in this section, the truth is far more complex than that
simple story. But I’ve got good news for you: Python lets you make that
story true. The -P interpreter option omits the directory containing your
script from the module path (or the current directory, if you’re running your
program with py -m <module> ). The -I interpreter option omits the
per-user environment from the module path, as well as any directories set
with PYTHONPATH . Use both options when running your Python
programs if you want a more predictable module path.
If you re-run the site module with the -I and -P options, the module
path is cut down to just the standard library and site packages:
$ py -IPm site
sys.path = [
'/usr/local/lib/python312.zip',
'/usr/local/lib/python3.12',
'/usr/local/lib/python3.12/lib-dynload',
'/usr/local/lib/python3.12/site-packages',
]
USER_BASE: '/home/user/.local' (exists)
USER_SITE: '/home/user/.local/lib/python3.12/site
ENABLE_USER_SITE: False
The current directory no longer appears on the module path, and the per-
user site packages are gone, too—even though the directory exists on this
system.
Summary
In this chapter, you’ve learned what Python environments are, where to find
them, and how they look on the inside. At the core, a Python environment
consists of the Python interpreter and Python modules, as well as entry-
point scripts to run Python applications. Environments are tied to a specific
version of the Python language.
Install Python applications with pipx to make them available globally while
keeping them in separate virtual environments. You can install and run an
application using a single command, such as pipx run black . Set the
PIPX_DEFAULT_PYTHON variable to ensure pipx installs tools on the
current Python release.
Uv is a blazingly fast drop-in replacement for virtualenv and pip with better
defaults. Use uv venv to create a virtual environment, and uv pip to
install packages into it. Both commands use the .venv directory by default,
just like the py tool on Unix. The --python option lets you select the
Python version for the environment.
In the final section of this chapter, you’ve learned how Python uses
sys.path to locate modules when you import them, and how the
module path is constructed during interpreter startup. You’ve also learned
how module import works under the hood, using finders and loaders as well
as the module cache. Interpreter discovery and module import are the key
mechanisms that link Python programs to an environment at runtime.
1
There’s also a pythonw.exe executable that runs programs without a console window, like GUI
applications.
2
A shared library is a file with executable code that multiple programs can use at runtime. The
operating system only keeps a single copy of the code in memory.
3
Windows installations don’t include an entry-point script for pydoc —launch it using py -m
pydoc instead.
4
Historically, macOS framework builds pioneered per-user installation before it became a standard in
2008.
5
This is a good thing: installing and uninstalling Python packages behind your package manager’s
back introduces a real chance of breaking your system.
6
You could force the use of symbolic links on Windows via the --symlinks option—but don’t.
There are subtle differences in the way these work on Windows. For example, the File Explorer
resolves the symbolic link before it launches Python, which prevents the interpreter from detecting
the virtual environment.
7
Before Python 3.12, the venv module also pre-installed setuptools for the benefit of legacy
packages that don’t declare it as a build dependency.
8
Internally, pip queries the sysconfig module for an appropriate installation scheme—a Python
environment layout. This module constructs the installation scheme using the build configuration of
Python and the location of the interpreter in the filesystem.
9
At the time of writing in 2024, pipx caches temporary environments for 14 days.
0
For modules located within a package, the __path__ attribute of the package takes the place of
sys.path .
Part II. Python Projects
Chapter 3. Python Packages
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the third chapter of the final book. Please note that the GitHub
repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
In this chapter you’ll learn how to package your Python projects for
distribution. A package is a single file containing an archive of your code
along with metadata that describes it, like the project name and version.
NOTE
Python folks use the word package for two distinct concepts. Import packages are modules that
contain other modules. Distribution packages are archive files for distributing Python software—
they’re the subject of this chapter.
You can install a package into a Python environment using a package
installer like pip. You can also upload it to a package repository for the
benefit of others. The Python Software Foundation (PSF) operates a
package repository known as the Python Package Index (PyPI). If your
package is on PyPI, anyone can install it by passing its project name to
pip install .
Packaging your project makes it easy to share with others, but there’s
another benefit. When you install your package, it becomes a first-class
citizen of a Python environment:
In this chapter, I’ll explain how you can package your Python projects and
introduce you to tools that help with packaging tasks. The chapter has three
parts:
In the first part, I’ll talk about the life of a Python package. I’ll also
introduce an example application that you’ll use throughout this book.
And I’ll ask: why would you want to package your code at all?
In the second part, I’ll introduce Python’s package configuration file,
pyproject.toml, and tools for working with packages: build ,
hatchling , and Twine. The tools pip, uv, and pipx also make a
reappearance. Finally, I’ll introduce Rye, a project manager that ties
these packaging tools together into a unified workflow. Along the way,
you’ll learn about build frontends and backends, wheels and sdists,
editable installs, and the src layout.
In the third part, I’ll look at project metadata in detail—the various
fields you can specify in pyproject.toml to define and describe your
package, and how to make efficient use of them.
A user can now fetch your package by specifying its name and
version.
You can install a freshly built package directly into an environment, without
uploading it to a package repository first—for example, when you’re testing
your package, or when you’re its only user.
In real life, tools often combine fetching and installing, building and
installing, and even building and publishing, into a single command.
An Example Application
Many applications start out as small, ad-hoc scripts. Example 3-1 fetches a
random article from Wikipedia and displays its title and summary in the
console. The script restricts itself to the standard library, so it runs in any
Python 3 environment.
import json
import textwrap
import urllib.request
API_URL = "https://en.wikipedia.org/api/rest_v1/p
def main():
with urllib.request.urlopen(API_URL) as respo
data = json.load(response)
print(data["title"], end="\n\n")
print(textwrap.fill(data["extract"]))
p t(te tw ap. (data[ e t act ]))
if __name__ == "__main__":
main()
The title and extract keys hold the title of the Wikipedia
page and a short plain text extract, respectively. The
textwrap.fill function wraps the text so that every line is at
most 70 characters long.
> py -m random_wikipedia_article
Jägersbleeker Teich
Jäge sb ee e e c
Why Packaging?
Sharing a script like Example 3-1 doesn’t require packaging. You can
publish it on a blog or a hosted repository, or send it to friends by email or
chat. Python’s ubiquity, the “batteries included” approach of its standard
library, and its nature as an interpreted language make this possible.
The ease of sharing modules with the world was a boon to Python’s
adoption in the early days. The Python programming language predates the
advent of language-specific package repositories—PyPI didn’t come about
1
for more than a decade.
Binary extensions
Metadata
You can embed metadata inside a module, using attributes like
__author__ , __version__ , or __license__ . But then
tools have to execute the module to read those attributes. Packages
contain static metadata that any tool can read without running
Python.
As you’ve seen, packaging solves many problems, but what’s the overhead?
In short, you drop a declarative file named pyproject.toml into your project
—a standard file that specifies the project metadata and its build system. In
return, you get commands to build, publish, and install your package.
[project]
name = "random-wikipedia-article"
version = "0.1"
[project.scripts]
random-wikipedia-article = "random_wikipedia_arti
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
TIP
PyPI projects share a single namespace—their names aren’t scoped by the users or organizations
owning the projects. Choose a unique name such as random-wikipedia-article-{your-
name} , and rename the Python module accordingly.
At the top level, the pyproject.toml file can contain up to three sections—or
tables, as the TOML standard calls them:
[project]
The project table holds the project metadata. The name and
version fields are mandatory. For real projects, you should
provide additional information, such as a description, the license, and
the required Python version (see “Project Metadata”). The
scripts section declares the name of the entry-point script and
the function it should call.
[build-system]
[tool]
The tool table stores configurations for each tool used by the
project. For example, the Ruff linter reads its configuration from the
[tool.ruff] table, while the type checker mypy uses
[tool.mypy] .
THE TOML FORMAT
Lists are termed arrays in TOML and use the same notation as Python:
Dictionaries are known as tables and come in several equivalent forms. You
can put the key/value pairs on separate lines, preceded by the table name in
square brackets:
[project]
name = "foo"
version = "0.1"
You can load a TOML file using the standard tomllib module:
import tomllib
Python represents a TOML file as a dictionary, where keys are strings and
values can be strings, integers, floats, dates, times, lists, or dictionaries.
Here’s what a pyproject.toml file looks like in Python:
{
"project": {
"name": "random-wikipedia-article",
"version": "0.1",
"scripts": {
"random-wikipedia-article": "random_wikip
}
},
"build-system": {
"requires": ["hatchling"],
"build-backend": "hatchling.build"
}
}
NOTE
A build frontend is an application that orchestrates the build process for a Python package. Build
frontends don’t know how to assemble packaging artifacts from source trees. The tool that does the
actual building is known as the build backend.
Open a terminal, change to the project directory, and invoke build with
pipx:
As you can see in the output above, build delegates the actual work to
hatchling , the build backend you designated in Example 3-2. A build
frontend uses the build-system table to determine the build backend
for the project (Table 3-1).
Table 3-1. The build-system table
Figure 3-2 shows how the build frontend and the build backend collaborate
to build a package.
The build frontend triggers the actual package build, in two steps.
First, it imports the module or object declared in build-
backend . Second, it invokes well-known functions for creating
packages and related tasks, known as build hooks.
$ py -m venv buildenv
$ buildenv/bin/python -m pip install hatchling
$ buildenv/bin/python
>>> import hatchling.build as backend
>>> backend.get_requires_for_build_wheel()
Some build frontends let you build in your current environment. If you disable build isolation, the
frontend only checks for build dependencies. If it installed them, the build and runtime dependencies
of different packages might conflict.
Why separate the build frontend from the build backend? It means that tools
can trigger package builds without knowing the intricacies of the build
process. For example, package installers like pip and uv build packages on
the fly when you install from a source directory (see “Installing Projects
from Source”).
Standardizing the contract between build frontends and build backends has
brought tremendous diversity and innovation to the packaging ecosystem.
Build frontends include build , pip, and uv, the backend-agnostic Python
project managers Rye, Hatch, and PDM, and test automation tools like tox.
Build backends include those shipped with the project managers Flit, Hatch,
PDM, and Poetry, the traditional build backend setuptools, as well as exotic
builders like Maturin, a build backend for Python modules written in the
Rust programming language, and Sphinx Theme Builder, a build backend
for Sphinx documentation themes (Table 3-2).
Table 3-2. Build backends
a
Project requires build-backend
a
See the official documentation of each tool for any recommended version bounds
First, register an account using the link on the front page of TestPyPI.
Second, create an API token from your account page and copy the token to
your preferred password manager. You can now upload the packages in dist
using Twine, the official PyPI upload tool.
View at:
https://test.pypi.org/project/random-wikipedia-ar
Congratulations, you have published your first Python package! Let’s install
the package from TestPyPI:
$ random-wikipedia-article
You could build a wheel with build and install it into a virtual
environment:
$ uv venv
$ uv pip install .
If your project comes with an entry-point script, you can also install it with
pipx:
$ pipx install .
installed package random-wikipedia-article 0.1,
These apps are now globally available
- random-wikipedia-article
Editable Installs
Editable installs achieve the best of both worlds by installing your package
in a special way that redirects imports to the source tree (see “Site
Packages”). You can think of this mechanism as a kind of “hot reloading”
for Python packages. The --editable option ( -e ) works with uv, pip,
and pipx:
Once you’ve installed your package in this way, you won’t need to reinstall
it to see changes to the source code—only when you edit pyproject.toml to
change the project metadata or add a third-party dependency.
Editable installs are modeled after the development mode feature from
setuptools, if you’ve been around long enough to be familiar with it. But
unlike setup.py develop , they rely on standard build hooks that any
build backend can provide.
Project Layout
Dropping a pyproject.toml next to a single-file module is an appealingly
simple approach. Unfortunately, this project layout comes with a serious
footgun, as you’ll see in this section. Let’s start by breaking something in
the project:
def main():
raise Exception("Boom!")
Before publishing your package, run a last smoke test with a locally built
wheel:
A bug found is a bug fixed. After removing the offending line, verify that
the program works as expected:
$ py -m random_wikipedia_article
Cystiscus viaderi
All good, time to cut a release! First, push your fix and a Git tag for the new
version to your code repository. Next, use Twine to upload the wheel to
PyPI:
$ pipx run twine upload dist/*
But, oh no—you never rebuilt the wheel. That bug is now in a public
release! How could that happen?
Instead, move your module out of the top-level directory so folks can’t
import it by mistake. By convention, Python source trees go into the src
directory—which is why the arrangement is known as src layout in the
Python community.
At this point, it also makes sense to convert your single-file module into an
import package. Replace the random_wikipedia_article.py file by a
random_wikipedia_article directory with a __init__.py module.
Placing your code in an import package is mostly equivalent to having it in
a single-file module—but there’s one difference: you can’t run the
application with py -m random_wikipedia_article unless you
also add the special __main__.py module to the package (Example 3-3).
main()
random-wikipedia-article
├── pyproject.toml
└── src
└── random_wikipedia_article
├── __init__.py
└── __main__.py
An import package makes it easier for your project to grow: you can move
code into separate modules and import it from there. For example, you
could extract the code that talks to the Wikipedia API into a function
fetch . Next, you might move the function to a module fetch.py in the
package. Here’s how you’d import the function from __init__.py:
The answer has to do with the nature and history of the Python project:
Python is a decentralized open source project driven by a community of
thousands of volunteers, with a history spanning more than three decades of
organic growth. This makes it hard for a single packaging tool to cater to all
3
demands and become firmly established.
But the Unix approach is no longer your only choice. Python project
managers provide a more integrated workflow. Among the first, Poetry (see
Chapter 5) has set itself the goal of reinventing Python packaging and
pioneered ideas such as static metadata and cross-platform lock files.
Your first step with Rye is initializing a new project with rye init . If
you don’t pass the project name, Rye uses the name of the current directory.
Use the --script option to include an entry-point script:
random-wikipedia-article
├── .git
├── .gitignore
├── .python-version
├── .venv
├── README.md
├── pyproject.toml
└── src
└── random_wikipedia_article
├── __init__.py
└── __main__.py
Many of Rye’s commands are frontends to tools that have become a de-
facto standard in the Python world, or that promise to become one in the
future. The command rye build creates packages with build , the
command rye publish uploads them using Twine, and the command
rye sync performs an editable install using uv:
$ rye build
$ rye publish --repository testpypi --repository-
$ rye sync
There’s much more to rye sync , though. Rye manages private Python
installations using the Python Standalone Builds project (see “A Brave New
World: Installing with Hatch and Rye”), and rye sync fetches each
Python version on first use. The command also generates a lock file for the
project dependencies and synchronizes the environment with that file (see
Chapter 4).
random_wikipedia_article-0.1.tar.gz
random_wikipedia_article-0.1-py2.py3-none-any.whl
These artifacts are known as wheels and sdists. Wheels are ZIP archives
with a .whl extension, and they’re built distributions—for the most part,
installers extract them into the environment as-is. Sdists, by contrast, are
source distributions: they’re compressed archives of the source code with
packaging metadata. Sdists require an additional build step to produce an
installable wheel.
TIP
The name “wheel” for a Python package is a reference to wheels of cheese. PyPI was originally
known as the Cheese Shop, after the Monty Python sketch about a cheese shop with no cheese
whatsoever. (These days, PyPI serves over a petabyte of packages per day.)
As a package author, you should build and publish both sdists and wheels
for your releases. This gives users a choice: They can fetch and install the
wheel if their environment is compatible (which is always the case for a
pure Python package)—or they can fetch the sdist and build a wheel from it
locally (see Figure 3-3).
Figure 3-3. Wheels and sdists
For consumers of packages, sdists come with a few caveats. First, the build
4
step involves arbitrary code execution, which can be a security concern.
Second, installing wheels is much faster than installing sdists, especially for
legacy setup.py-based packages. Lastly, users may encounter confusing
build errors for extension modules if they don’t have the required build
toolchain on their system.
Generally, a pure Python package has a single sdist and a single wheel for a
given release. Binary extension modules, on the other hand, commonly
come in wheels for a range of platforms and environments.
TIP
If you’re an author of extension modules, check out the cibuildwheel project: it automates the
building and testing of wheels across multiple platforms, with support for GitHub Actions and
various other continuous integration (CI) systems.
WHEEL COMPATIBILITY TAGS
Installers select the appropriate wheel for an environment using three so-
called compatibility tags that are embedded in the name of each wheel file:
Python tag
ABI tag
Platform tag
Wheels with binary extension modules, on the other hand, have more
stringent compatibility requirements. Take a look at the compatibility tags
of these wheels, for example:
numpy-1.24.0-cp311-cp311-macosx_10_9_x86_64.whl
NumPy is a fundamental library for scientific computing. Its wheel
targets a specific Python implementation and version (CPython 3.11),
operating system release (macOS 10.9 and above), and processor
architecture (x86-64).
cryptography-38.0.4-cp36-abi3-manylinux_2_28_x86_64.whl
Core Metadata
If you’re curious, you can extract a wheel using the unzip utility to see
the files installers place in the site-packages directory. Execute the
following commands in a shell on Linux or macOS—preferably inside an
empty directory. If you’re on Windows, you can follow along using the
Windows Subsystem for Linux (WSL).
Besides the import packages (named attr and attrs in this case), the
wheel contains a .dist-info directory with administrative files. The
METADATA file in this directory contains the core metadata for the
package, a standardized set of attributes that describe the package for the
benefit of installers and other packaging tools. You can access the core
metadata of installed packages at runtime using the standard library:
NOTE
The core metadata standards predate pyproject.toml by many years. Most project metadata fields
correspond to a core metadata field, but their names and syntax differ slightly. As a package author,
you can safely ignore this translation and focus on the project metadata.
Project Metadata
Build backends write out core metadata fields based on what you specify in
the project table of pyproject.toml. Table 3-3 provides an overview of
all the fields you can use in the project table.
Table 3-3. The project table
[project]
name = "random-wikipedia-article"
version = "0.1"
description = "Display extracts from random Wikip
keywords = ["wikipedia"]
readme = "README.md" # only if your project has
license = { text = "MIT" }
authors = [{ name = "Your Name", email = "you@exa
classifiers = ["Topic :: Games/Entertainment :: F
urls = { Homepage = "https://yourname.dev/project
requires-python = ">=3.8"
dependencies = ["httpx>=0.27.0", "rich>=13.7.1"]
In the following sections, I’ll take a closer look at the various project
metadata fields.
Naming Projects
Your users specify this name to install the project with pip. This field also
determines your project’s URL on PyPI. You can use any ASCII letter or
digit to name your project, interspersed with periods, underscores, and
hyphens. Packaging tools normalize project names for comparison: all
letters are converted to lowercase, and punctuation runs are replaced by a
single hyphen (or underscore, in the case of package filenames). For
example, Awesome.Package , awesome_package , and
awesome-package all refer to the same project.
Project names are distinct from import names, the names users specify to
import your code. Import names must be valid Python identifiers, so they
can’t have hyphens or periods and can’t start with a digit. They’re case-
sensitive and can contain any Unicode letter or digit. As a rule of thumb,
you should have a single import package per distribution package and use
the same name for both (or a straightforward translation, like random-
wikipedia-article and random_wikipedia_article ).
Versioning Projects
Dynamic Fields
The pyproject.toml standard encourages projects to define their metadata
statically, rather than rely on the build backend to compute the fields during
the package build. Static metadata benefits the packaging ecosystem,
because it makes the fields accessible to other tools. It also reduces
cognitive overhead: build backends use the same configuration format and
populate the fields in a straightforward and transparent way.
But sometimes it’s useful to let the build backend fill in a field dynamically.
For example, the next section shows how you can derive the package
version from a Python module or Git tag instead of duplicating it in
pyproject.toml.
For this reason, the project metadata standard provides an escape hatch in
the form of dynamic fields. Projects are allowed to use a backend-specific
mechanism to compute a field on the fly if they list its name under the
dynamic key:
[project]
dynamic = ["version", "readme"]
Many projects declare their version at the top of a Python module, like this:
__version__ = "0.2"
Updating a frequently changing item in several locations is tedious and
error-prone. Some build backends therefore allow you to extract the version
number from the code instead of repeating it in project.version .
This mechanism is specific to your build backend, so you configure it in the
tool table. Example 3-5 demonstrates how this works with Hatch.
[project]
name = "random-wikipedia-article"
dynamic = ["version"]
[tool.hatch.version]
path = "random_wikipedia_article.py"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
You can also avoid the duplication by going in the other direction: Declare
the version statically in pyproject.toml and read it from the installed
metadata at runtime, as shown in Example 3-6.
Example 3-6. Reading the version from the installed metadata
__version__ = version("random-wikipedia-article")
But don’t go and add this boilerplate to all your projects yet. Reading the
metadata from the environment isn’t something you want to do during
program startup. Third-party libraries like click perform the metadata
lookup on demand—for example, when the user specifies a command-line
option like --version . You can read the version on demand by
5
providing a __getattr__ function for your module (Example 3-7).
def __getattr__(name):
if name != "__version__":
msg = f"module {__name__} has no attribut
raise AttributeError(msg)
return version("random-wikipedia-article")
Alas, you still haven’t truly single-sourced the version. Most likely, you
also tag releases in your version control system (VCS) using a command
like git tag v1.0.0 . (If you don’t, you should—if a release has a
bug, the version tags help you find the commit that introduced it.)
Luckily, a number of build backends come with plugins that extract the
version number from Git, Mercurial, and similar systems. This technique
was pioneered by the setuptools-scm plugin. For Hatch, you can use
the hatch-vcs plugin, which is a wrapper around setuptools-scm
(Example 3-8).
Example 3-8. Deriving the project version from the version control
system
[project]
name = "random-wikipedia-article"
dynamic = ["version"]
[tool.hatch.version]
source = "vcs"
[build-system]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"
If you build this project from a repository and you’ve checked out the tag
v1.0.0 , Hatch will use the version 1.0.0 for the metadata. If you’ve
checked out an untagged commit, Hatch will instead generate a
6
developmental release like 0.1.dev1+g6b80314 . In other words, you
read the project version from Git during the package build, and from the
package metadata at runtime.
Entry-point Scripts
Entry-point scripts are small executables that launch the interpreter from
their environment, import a module and invoke a function (see “Entry-point
scripts”). Installers like pip generate them on the fly when they install a
package.
[project.scripts]
random-wikipedia-article = "random_wikipedia_arti
This declaration allows users to invoke the program using its given name:
$ random-wikipedia-article
[project.gui-scripts]
random-wikipedia-article-gui = "random_wikipedia_
Entry Points
You can also register submodules using dotted notation, as well as objects
within modules, using the format module:object :
[project.entry-points.some_application]
my-plugin = "my_plugin.submodule:plugin"
Let’s look at an example to see how this works. Random Wikipedia articles
7
make for fun little fortune cookies, but they can also serve as test fixtures
for developers of Wikipedia viewers and similar apps. Let’s turn the app
into a plugin for the pytest testing framework. (Don’t worry if you haven’t
worked with pytest yet; I’ll cover testing in depth in Chapter 6.)
Pytest allows third-party plugins to extend its functionality with test fixtures
and other features. It defines an entry point group for such plugins named
pytest11 . You can provide a plugin for pytest by registering a module
in this group. Let’s also add pytest to the project dependencies.
[project]
dependencies = ["pytest"]
[project.entry-points.pytest11]
random-wikipedia-article = "random_wikipedia_arti
For simplicity, I’ve chosen the top-level module that hosted the main
function in Example 3-1. Next, extend pytest with a test fixture returning a
random Wikipedia article, as shown in Example 3-9.
import json
import urllib.request
import pytest
API_URL = "https://en.wikipedia.org/api/rest_v1/p
@pytest.fixture
def random_wikipedia_article():
with urllib.request.urlopen(API_URL) as respo
return json.load(response)
Example 3-10. A test function that uses the random article fixture
# test_wikipedia_viewer.py
def test_wikipedia_viewer(random_wikipedia_articl
print(random_wikipedia_article["extract"])
assert False
You can try this out yourself in an active virtual environment in the project
directory:
$ py -m pip install .
$ py -m pytest test_wikipedia_viewer.py
============================= test session starts
platform darwin -- Python 3.12.2, pytest-8.1.1, p
rootdir: ...
plugins: random-wikipedia-article-0.1
collected 1 item
test_wikipedia_viewer.py F
def test_wikipedia_viewer(random_wikipedia_ar
_ _ _ _
print(random_wikipedia_article["extract"]
> assert False
E assert False
test_wikipedia_viewer.py:4: AssertionError
----------------------------- Captured stdout cal
Halgerda stricklandi is a species of sea slug, a
marine gastropod mollusk in the family Discodorid
=========================== short test summary in
FAILED test_wikipedia_viewer.py::test_wikipedia_v
============================== 1 failed in 1.10s
[project]
authors = [{ name = "Your Name", email = "you@exa
maintainers = [
{ name = "Alice", email = "[email protected]" }
{ name = "Bob", email = "[email protected]" },
]
The meaning of the fields is somewhat open to interpretation. If you start a
new project, I recommend including yourself under authors and
omitting the maintainers field. Long-lived open source projects
typically list the original author under authors , while the people in
charge of ongoing project maintenance appear as maintainers .
[project]
description = "Display extracts from random Wikip
[project]
readme = "README.md"
Instead of a string, you can also specify a table with file and
content-type keys.
[project]
readme = { file = "README", content-type = "text/
You can even embed the long description in the pyproject.toml file using the
text key.
[project.readme]
content-type = "text/markdown"
text = """
# random-wikipedia-article
[project]
keywords = ["wikipedia"]
[project]
classifiers = [
"Development Status :: 3 - Alpha",
"Environment :: Console",
"Topic :: Games/Entertainment :: Fortune Cook
]
Classifier
Description Example
Group
[project.urls]
Homepage = "https://yourname.dev/projects/random-
Source = "https://github.com/yourname/random-wiki
Issues = "https://github.com/yourname/random-wiki
Documentation = "https://readthedocs.io/random-wi
The License
[project]
license = { text = "MIT" }
classifiers = ["License :: OSI Approved :: MIT Li
I recommend using the text key with a SPDX license identifier such as
“MIT” or “Apache-2.0”. The Software Package Data Exchange (SPDX) is
an open standard backed by the Linux Foundation for communicating
software bill of material information, including licenses.
NOTE
As of this writing, a Python Enhancement Proposal (PEP) is under discussion that changes the
license field to a string using SPDX syntax and adds a license-files key for license files
distributed with the package: PEP 639.
If you’re unsure which open source license to use for your project,
choosealicense.com provides some useful guidance. For a proprietary
project, it’s common to specify “proprietary”. You can also add the special
classifier Private :: Do Not Upload to prevent accidental upload
to PyPI.
[project]
license = { text = "proprietary" }
classifiers = [
"License :: Other/Proprietary License",
"Private :: Do Not Upload",
]
[project]
requires-python = ">=3.8"
Most commonly, people specify the minimum Python version as a lower
bound, using a string with the format >=3.x . The syntax of this field is
more general and follows the same rules as version specifiers for project
dependencies (see Chapter 4).
Tools like Nox and tox make it easy to run checks across multiple Python
versions, helping you ensure that the field reflects reality. As a baseline, I
recommend requiring the oldest Python version that still receives security
updates. You can find the end-of-life dates for all current and past Python
versions on the Python Developer Guide.
There are three main reasons to be more restrictive about the Python
version. First, your code may depend on newer language features—for
example, structural pattern matching was introduced in Python 3.10.
Second, your code may depend on newer features in the standard library—
look out for the “Changed in version 3.x” notes in the official
documentation. Third, it could depend on third-party packages with more
restrictive Python requirements.
WARNING
Don’t specify an upper bound for the required Python version unless you know that your package is
not compatible with any higher version. Upper bounds cause disruption in the ecosystem when a new
version is released.
Summary
Packaging allows you to publish releases of your Python projects, using
source distributions (sdists) and built distributions (wheels). These artifacts
contain your Python modules, together with project metadata, in an archive
format that end users can easily install into their environments. The
standard pyproject.toml file defines the build system for a Python project as
well as the project metadata. Build frontends like build , pip, and uv use
the build system information to install and run the build backend in an
isolated environment. The build backend assembles an sdist and wheel from
the source tree and embeds the project metadata. You can upload packages
to the Python Package Index (PyPI) or a private repository, using a tool like
Twine. The Python project manager Rye provides a more integrated
workflow on top of these tools.
1
Even the venerable Comprehensive Perl Archive Network (CPAN) didn’t exist in February 1991,
when Guido van Rossum published the first release of Python on Usenet.
2
By default, the build tool builds the wheel from the sdist instead of the source tree, to ensure that
the sdist is valid. Build backends can request additional build dependencies using the
get_requires_for_build_wheel and get_requires_for_build_sdist build
hooks.
3
Python’s packaging ecosystem is also a great demonstration of Conway’s law. In 1967, Melvin
Conway—an American computer scientist also known for developing the concept of coroutines—
observed that organizations will design systems that are copies of their communication structure.
4
This is especially true given the existence of typosquatting—where an attacker uploads a malicious
package whose name is similar to a popular package—and dependency confusion attacks—where a
malicious package on a public server uses the same name as a package on a private company
repository.
5
This nifty technique comes courtesy of my reviewer Hynek Schlawack.
6
In case you’re wondering, the +g6b80314 suffix is a local version identifier that designates
downstream changes, in this case using output from the command git describe .
7
Test fixtures set up objects that you need to run repeatable tests against your code.
8
You can also add Trove classifiers for each supported Python version. Some backends backfill
classifiers for you—Poetry does this out of the box for Python versions and project licenses.
Chapter 4. Dependency Management
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the fourth chapter of the final book. Please note that the GitHub
repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
Many projects also use third-party tools for developer tasks—like running
the test suite or building documentation. These packages are known as
development dependencies: end users don’t need them to run your code. A
related case is the build dependencies from Chapter 3, which let you create
packages for your project.
Example 4-1 shows how you can use httpx to send a request to the
Wikipedia API with the header. You could also use the standard library to
send a User-Agent header with your requests. But httpx offers a
more intuitive, explicit, and flexible interface, even when you’re not using
any of its advanced features. Try it out:
import textwrap
import httpx
API_URL = "https://en.wikipedia.org/api/rest_v1/p
USER_AGENT = "random-wikipedia-article/0.1 (Conta
def main():
headers = {"User-Agent": USER_AGENT}
print(data["title"], end="\n\n")
print(textwrap.fill(data["extract"]))
The json method abstracts the details of parsing the response body
as JSON.
While you’re at it, let’s improve the look and feel of the program.
Example 4-2 uses Rich, a library for console output, to display the article
title in bold. That hardly scrapes the surface of Rich’s formatting options.
Modern terminals are surprisingly capable, and Rich lets you leverage their
potential with ease. Take a look at its official documentation for details.
import httpx
from rich.console import Console
def main():
...
console = Console(width=72, highlight=False)
console.print(data["title"], style="bold", en
co so e.p t(data[ t t e ], sty e bo d , e
console.print(data["extract"])
The style keyword allows you to set the title apart using a bold
font.
$ uv venv
$ uv pip install --editable .
You may be tempted to install httpx and rich manually into the
environment. Instead, add them to the project dependencies in
pyproject.toml. This ensures that whenever you install your project, the two
packages are installed along with it.
[project]
name = "random-wikipedia-article"
version = "0.1"
dependencies = ["httpx", "rich"]
...
If you reinstall the project, you’ll see that uv installs its dependencies as
well:
Version Specifiers
[project]
d d i ["h 0 27 0" " i h 13 7 1"]
dependencies = ["httpx>=0.27.0", "rich>=13.7.1"]
[project]
dependencies = ["awesome>=1.2,<2"]
WARNING
Excluding versions after the fact has a pitfall that you need to be aware of. Dependency resolvers can
decide to downgrade your project to a version without the exclusion and upgrade the dependency
anyway. Lock files can help with this.
UPPER VERSION BOUNDS IN PYTHON
Even in clear cases, a breaking change will only break your project if it
affects the part of the public API that your project uses. By contrast, many
changes that will break your project aren’t marked by a version number:
they’re just bugs. In the end, you’ll still rely on automated tests to discover
“bad” versions and deal with them after the fact.
Extras
Suppose you want to use the newer HTTP/2 protocol with httpx . This
only requires a small change to the code that creates the HTTP client:
def main():
headers = {"User-Agent": USER_AGENT}
with httpx.Client(headers=headers, http2=True
...
...
Under the hood, httpx delegates the gory details of speaking HTTP/2 to
another package, h2 . That dependency is not pulled in by default,
however. This way, users who don’t need the newer protocol get away with
a smaller dependency tree. You do need it here, so activate the optional
feature using the syntax httpx[http2] :
[project]
dependencies = ["httpx[http2]>=0.27.0", "rich>=13
Optional dependencies
Let’s take a look at this situation from the point of view of httpx . The
h2 and brotli dependencies are optional, so httpx declares them
under optional-dependencies instead of dependencies
(Example 4-3).
Example 4-3. Optional dependencies of httpx (simplified)
[project]
name = "httpx"
[project.optional-dependencies]
http2 = ["h2>=3,<5"]
brotli = ["brotli"]
try:
import h2
except ImportError:
h2 = None
Environment Markers
def build_user_agent():
fields = metadata("random-wikipedia-article")
return USER_AGENT.format_map(fields)
def main():
headers = {"User-Agent": build_user_agent()}
...
The metadata function retrieves the core metadata fields for the
package.
Not quite. Fortunately, many additions to the standard library come with
backports—third-party packages that provide the functionality for older
interpreters. For importlib.metadata , you can fall back to the
importlib-metadata backport from PyPI. The backport remains
useful because the library changed several times after its introduction.
You only need backports in environments that use specific Python versions.
An environment marker lets you express this as a conditional dependency:
Installers will only install the package on interpreters older than Python 3.8.
Environment Standard
Description Examples
Marker Library
a
The python_version and implementation_version markers apply
transformations. See PEP 508 for details.
Going back to Example 4-4, here’s the full dependencies field with
the python_version marker for importlib-metadata :
[ j ]
[project]
dependencies = [
"httpx[http2]>=0.27.0",
"rich>=13.7.1",
"importlib-metadata>=7.0.2; python_version <
]
Did I just hear somebody shout “EAFP”? If your imports depend on the
Python version, it’s better to avoid the technique from “Optional
dependencies” and “look before you leap.” An explicit version check
communicates your intent to static analyzers, such as the mypy type
checker (see Chapter 10). EAFP may result in errors from these tools
because they can’t detect when each module is available.
Markers support the same equality and comparison operators as version
specifiers (Table 4-1). Additionally, you can use in and not in to
match a substring against the marker. For example, the expression 'arm'
in platform_version checks if platform.version()
contains the string 'arm' .
You can also combine multiple markers using the boolean operators and
and or . Here’s a rather contrived example combining all these features:
[project]
dependencies = ["""
awesome-package; python_full_version <= '3.8.1
and (implementation_name == 'cpython' or impl
and sys_platform == 'darwin'
and 'arm' in platform_version
"""]
The example also relies on TOML’s support for multi-line strings, which
uses triple quotes just like Python. Dependency specifications cannot span
multiple lines, so you have to escape the newlines with a backslash.
Development Dependencies
Development dependencies are third-party packages that you require during
development. As a developer, you might use the pytest testing framework to
run the test suite for your project, the Sphinx documentation system to build
its docs, or a number of other tools to help with project maintenance. Your
users, on the other hand, don’t need to install any of these packages to run
your code.
def test_build_user_agent():
assert "random-wikipedia-article" in build_us
Example 4-5 only uses built-in Python features, so you could just import
and run the test manually. But even for this tiny test, pytest adds three
useful features. First, it discovers modules and functions whose names start
with test , so you can run your tests by invoking pytest without
arguments. Second, pytest shows tests as it executes them, as well as a
summary with the test results at the end. Third, pytest rewrites assertions in
your tests to give you friendly, informative messages when they fail.
Let’s run the test with pytest. I’m assuming you already have an active
virtual environment with an editable install of your project. Enter the
commands below to install and run pytest in that environment:
tests/test_random_wikipedia_article.py .
For now, things look great. Tests help your project evolve without breaking
things. The test for build_user_agent is a first step in that direction.
Installing and running pytest is a small infrastructure cost compared to these
long-term benefits.
Setting up a project environment becomes harder as you acquire more
development dependencies—documentation generators, linters, code
formatters, type checkers, or other tools. Even your test suite may require
more than pytest: plugins for pytest, tools for measuring code coverage, or
just packages that help you exercise your code.
You also need compatible versions of these packages—your test suite may
require the latest version of pytest, while your documentation may not build
on the new Sphinx release. Each of your projects may have slightly
different requirements. Multiply this by the number of developers working
on each project, and it becomes clear that you need a way to track your
development dependencies.
Optional Dependencies
Furthermore, you can’t install extras without the project itself. By contrast,
not all developer tools need your project installed. For example, linters
analyze your source code for bugs and potential improvements. You can run
them on a project without installing it into the environment. Besides
wasting time and space, “fat” environments constrain dependency
resolution unnecessarily. For example, many Python projects could no
longer upgrade important dependencies when the Flake8 linter put a version
cap on importlib-metadata .
Keeping this in mind, extras are widely used for development dependencies
and are the only method covered by a packaging standard. They’re a
pragmatic choice, especially if you manage linters with pre-commit (see
Chapter 9). Example 4-6 shows how you’d use extras to track packages
required for testing and documentation.
[project.optional-dependencies]
tests = ["pytest>=8.1.1", "pytest-sugar>=1.0.0"]
docs = ["sphinx>=7.2.6"]
You can now install the test dependencies using the tests extra:
You can also define a dev extra with all the development dependencies.
This lets you set up a development environment in one go, with your
project and every tool it uses:
[project.optional-dependencies]
tests = ["pytest>=8.1.1", "pytest-sugar>=1.0.0"]
docs = ["sphinx>=7.2.6"]
dev = ["random-wikipedia-article[tests,docs]"]
Requirements Files
pytest>=8.1.1
pytest-sugar>=1.0.0
sphinx>=7.2.6
You can install the dependencies listed in a requirements file using pip or
uv:
# requirements/tests.txt
-e .
pytest>=8.1.1
pytest-sugar>=1.0.0
# requirements/docs.txt
sphinx>=7.2.6
# requirements/dev.txt
-r tests.txt
-r docs.txt
The tests.txt file requires an editable install of the project because the
test suite needs to import the application modules.
The docs.txt file doesn’t require the project. (That’s assuming you
build the documentation from static files only. If you use the
autodoc Sphinx extension to generate API documentation from
docstrings in your code, you’ll also need the project here.)
NOTE
If you include other requirements files using -r , their paths are evaluated relative to the including
file. By contrast, paths to dependencies are evaluated relative to your current directory, which is
typically the project directory.
Create and activate a virtual environment, then run the following commands
to install the development dependencies and run the test suite:
Locking Dependencies
You’ve installed your dependencies in a local environment or in continuous
integration (CI), and you’ve run your test suite and any other checks you
have in place. Everything looks good, and you’re ready to deploy your
code. But how do you install the same packages in production that you used
when you ran your checks?
WARNING
Supply chain attacks infiltrate a system by targeting its third-party dependencies. For example, in
2022, a threat actor dubbed “JuiceLedger” uploaded malicious packages to legitimate PyPI projects
6
after compromising them with a phishing campaign.
There are many reasons why environments end up with different packages
given the same dependency specifications. Most of them fall into two
categories: upstream changes and environment mismatch. First, you can get
different packages if the set of available packages changes upstream:
You need a way to define the exact set of packages required by your
application, and you want its environment to be an exact image of this
package inventory. This process is known as locking, or pinning, the project
dependencies, which are listed in a lock file.
So far, I’ve talked about locking dependencies for reliable and reproducible
deployments. Locking is also beneficial during development, for both
applications and libraries. By sharing a lock file with your team and with
contributors, you put everybody on the same page: every developer uses the
same dependencies when running the test suite, building the documentation,
or performing other tasks. Using the lock file for mandatory checks avoids
surprises where checks fail in CI after passing locally. To reap these
benefits, lock files must include development dependencies, too.
In this section, I’ll introduce two methods for locking dependencies using
requirements files: freezing and compiling requirements. In Chapter 5, I’ll
describe Poetry’s lock files.
“LOCKING” DEPENDENCIES IN THE PROJECT METADATA
If you want to lock the dependencies for an application, why not narrow
down the version constraints in pyproject.toml? For example, couldn’t you
lock the dependencies on httpx and rich as shown below?
[project]
dependencies = ["httpx[http2]==0.27.0", "rich==13
Requirements files are a popular format for locking dependencies. They let
you keep the dependency information separate from the project metadata.
Pip and uv can generate these files from an existing environment:
$ uv pip install .
$ uv pip freeze
anyio==4.3.0
certifi==2024.2.2
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==1.0.4
httpx==0.27.0
hyperframe==6.0.1
idna==3.6
markdown-it-py==3.0.0
mdurl==0.1.2
pygments==2.17.2
random-wikipedia-article @ file:///Users/user/ran
rich==13.7.1
sniffio==1.3.1
When deploying your project to production, you can install the project and
its dependencies like this:
TIP
Lock your dependencies on the same Python version, Python implementation, operating system, and
processor architecture as those used in production. If you deploy to multiple environments, generate a
requirements file for each one.
The pip-tools project lets you lock dependencies without these limitations.
You can compile requirements directly from pyproject.toml, without
installing the packages. Under the hood, pip-tools leverages pip and its
dependency resolver.
Pip-tools and uv annotate the file to indicate the dependent package for
each dependency, as well as the command used to generate the file. There’s
one more difference to the output of pip freeze : the compiled
requirements don’t include your own project. You’ll have to install it
separately after applying the requirements file.
httpx==0.27.0 \
--hash=sha256:71d5465162c13681bff01ad59b2cc68dd83
--hash=sha256:a0cb88a46f32dc874e04ee956e4c2764aba
Package hashes make installations more deterministic and reproducible.
They’re also an important tool in organizations that require screening every
artifact that goes into production. Validating the integrity of packages
prevents on-path attacks where a threat actor (“man in the middle”)
intercepts a package download to supply a compromised artifact.
Hashes also have the side effect that pip refuses to install packages without
them: either all packages have hashes, or none do. As a consequence,
hashes protect you from installing files that aren’t listed in the requirements
file.
Install the requirements file in the target environment using pip or uv,
followed by the project itself. You can harden the installation using a couple
of options: the option --no-deps ensures that you only install packages
listed in the requirements file, and the option --no-cache prevents the
installer from reusing downloaded or locally built artifacts.
For example, here’s how you’d upgrade Rich to the latest version:
So far, you’ve created the target environment from scratch. You can also
use pip-sync to synchronize the target environment with the updated
requirements file. Don’t install pip-tools in the target environment for this:
its dependencies may conflict with those of your project. Instead, use pipx,
as you did with pip-compile . Point pip-sync to the target
interpreter using its --python-executable option:
$ pipx run --spec=pip-tools pip-sync --python-exe
The command removes the project itself since it’s not listed in the
requirements file. Re-install it after synchronizing:
If you have finer-grained extras, the process is the same. You may want to
store the requirements files in a requirements directory to avoid clutter.
If you specify your development dependencies in requirements files instead
of extras, compile each of these files in turn. By convention, input
requirements use the .in extension, while output requirements use the .txt
extension (Example 4-10).
# requirements/tests.in
pytest>=8.1.1
pytest-sugar>=1.0.0
# requirements/docs.in
sphinx>=7.2.6
# requirements/dev.in
-r tests.in
-r docs.in
Unlike Example 4-9, the input requirements don’t list the project itself. If
they did, the output requirements would include the path to the project—
and every developer would end up with a different path. Instead, pass
pyproject.toml together with the input requirements to lock the entire set of
dependencies together:
Why bother compiling dev.txt at all? Can’t it just include docs.txt and
tests.txt? If you install separately locked requirements on top of each other,
they may well end up conflicting. Let the dependency resolver see the full
picture. If you pass all the input requirements, it can give you a consistent
dependency tree in return.
Option Description
Summary
In this chapter, you’ve learned how to declare project dependencies using
pyproject.toml and how to declare development dependencies using either
extras or requirements files. You’ve also learned how to lock dependencies
for reliable deployments and reproducible checks using pip-tools. In the
next chapter, you’ll see how the project manager Poetry helps with
dependency management using dependency groups and lock files.
1
In a wider sense, the dependencies of a project consist of all software packages that users require to
run its code—including the interpreter, the standard library, third-party packages, and system
libraries. Conda and distro-level package managers like APT, DNF, and Homebrew support this
generalized notion of dependencies.
2
Henry Schreiner: “Should You Use Upper Bound Version Constraints?,” December 9, 2021.
3
For simplicity, the code doesn’t handle multiple authors—which one ends up in the header is
undefined.
4
Robert Collins: “PEP 508 – Dependency specification for Python Software Packages,” November
11, 2015.
5
Stephen Rosen: “PEP 735 – Dependency Groups in pyproject.toml,” November 20, 2023.
6
Dan Goodin: “Actors behind PyPI supply chain attack have been active since late 2021,” September
2, 2022.
7
Natalie Weizenbaum: “PubGrub: Next-Generation Version Solving,” April 2, 2018
8
Brett Cannon: “Lock files, again (but this time w/ sdists!),” February 22, 2024.
9
Uninstalling the package isn’t enough: the installation can have side effects on your dependency
tree. For example, it may upgrade or downgrade other packages or pull in additional dependencies.
Chapter 5. Managing Projects with
Poetry
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the fifth chapter of the final book. Please note that the GitHub
repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
The Python project manager Poetry was addressing these problems before
some of the standards governing pyproject.toml took shape. Its friendly
command-line interface lets you perform most tasks related to packaging,
dependencies, and environments. Poetry brings its own standards-compliant
build backend, poetry.core —but you can remain blissfully unaware
of this fact. It also comes with a strict dependency resolver and locks all
dependencies by default, behind the scenes.
A decade ago, Python packaging was firmly in the hands of three tools:
setuptools, virtualenv, and pip. You’d use setuptools to create Python
packages, virtualenv to set up virtual environments, and pip to install
packages into them. Everybody did. Around 2016—the same year that the
pyproject.toml file became standard—things started to change.
Poetry, started in 2018 by Sébastien Eustace, was the first tool to provide a
unified approach to packaging, dependencies, and environments—and
quickly became widely adopted. Two other tools follow a similarly holistic
approach: PDM, started by Frost Ming in 2019, and Hatch by Ofek Lev in
2017. Hatch has recently grown in popularity, especially among tooling and
library developers. In 2023, they were joined by Rye, a project manager
written in Rust by Armin Ronacher. In addition, Hatch and Rye also
manage Python installations, leveraging the Python Standalone Builds
project.
Poetry, Hatch, PDM, and Rye each give you an integrated workflow for
managing Python packages, environments, and dependencies. As such,
they’ve come to be known as Python project managers. Keep an eye on
Astral’s uv as well!
Installing Poetry
Install Poetry globally using pipx, to keep its dependencies isolated from
the rest of the system:
You can omit the --python option if pipx already uses the new Python
version (see “Configuring Pipx”).
When a prerelease of Poetry becomes available, you can install it side-by-
side with the stable version:
Above, I’ve used the --suffix option to rename the command so you
can invoke it as poetry@preview , while keeping poetry as the
stable version. The --pip-args option lets you pass options to pip, like
--pre for including prereleases.
NOTE
Poetry also comes with an official installer, which you can download and run with Python. It’s not as
flexible as pipx, but it provides a readily available alternative:
Creating a Project
You can create a new project using the command poetry new . As an
example, I’ll use the random-wikipedia-article project from
previous chapters. Run the following command in the parent directory
where you want to keep your new project:
After running this command, you’ll see that Poetry created a project
directory named random-wikipedia-article, with the following structure:
random-wikipedia-article
├── README.md
├── pyproject.toml
├── src
│ └── random_wikipedia_article
│ └── __init__.py
└── tests
└── __init__.py
Until a few years ago, package authors placed the import package directly
in the project directory. These days, a project layout with src, tests, and docs
directories at the top is becoming more common.
Keeping the import package tucked away under src has practical
advantages. During development, the current directory often appears at the
start of sys.path . Without an src layout, you may be importing your
project from its source code, not from the package you’ve installed in the
project environment. In the worst case, your tests could fail to detect issues
in a release you’re about to publish.
On the other hand, whenever you want to execute the source code itself,
editable installs achieve this by design. With an src layout, packaging tools
can implement editable installs by adding the src directory to sys.path
—without the side effect of making unrelated Python files importable.
[tool.poetry]
name = "random-wikipedia-article"
version = "0.1.0"
description = ""
authors = ["Your Name <[email protected]>"]
readme = "README.md"
packages = [{include = "random_wikipedia_article"
[tool.poetry.dependencies]
python = "^3.12"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Example 5-2 fills in the metadata for the project. I’ve highlighted some
differences from Example 3-4. (You’ll use the command-line interface to
add the dependencies later.)
[tool.poetry]
name = "random-wikipedia-article"
version = "0.1.0"
description = "Display extracts from random Wikip
keywords = ["wikipedia"]
license = "MIT"
classifiers = [
"License :: OSI Approved :: MIT License",
"Development Status :: 3 - Alpha",
"Environment :: Console",
"Topic :: Games/Entertainment :: Fortune Cook
]
authors = ["Your Name <[email protected]>"]
readme = "README.md"
homepage = "https://yourname.dev/projects/random-
repository = "https://github.com/yourname/random-
documentation = "https://readthedocs.io/random-wi
packages = [{include = "random_wikipedia_article"
[tool.poetry.dependencies]
python = ">=3.10"
[tool.poetry.urls]
Issues = "https://github.com/yourname/random-wiki
[tool.poetry.scripts]
random-wikipedia-article = "random_wikipedia_arti
Poetry has dedicated fields for some project URLs, namely its
homepage, repository, and documentation; for other URLs, there’s
also a generic urls table.
project
Field Type Description
field
$ poetry check
All set!
Poetry allows you to specify which files and directories to include in the
distribution—a feature still missing from the pyproject.toml standards
(Table 5-2).
The include and exclude fields allow you to list other files to
include in, or exclude from, the distribution. Poetry seeds the exclude
field using the .gitignore file, if present. Instead of a string, you can also use
a table with path and format keys for sdist-only or wheel-only files.
Example 5-3 shows how to include the test suite in source distributions.
Copy the contents of Example 5-4 into the __init__.py file in the new
project.
i t htt
import httpx
from rich.console import Console
API_URL = "https://en.wikipedia.org/api/rest_v1/p
USER_AGENT = "{Name}/{Version} (Contact: {Author-
def main():
fields = metadata("random-wikipedia-article")
headers = {"User-Agent": USER_AGENT.format_ma
Updating dependencies
Resolving dependencies... (0.2s)
If you inspect pyproject.toml after running this command, you’ll find that
Poetry has added Rich to the dependencies table (Example 5-5):
Poetry also installs the package into an environment for the project. If you
already have a virtual environment in .venv, Poetry uses that. Otherwise, it
creates a virtual environment in a shared location (see “Managing
Environments”).
Caret Constraints
rich = "^13.7.1"
rich = ">=13.7.1,<14"
On the other hand, tilde constraints typically exclude minor releases:
rich = "~13.7.1"
rich = ">=13.7.1,==13.7.*"
You can use this command to remove upper bounds from existing caret
2
constraints, as well. If you specified extras or markers when you first
added the dependency, you’ll need to specify them again.
SHOULD YOU CAP DEPENDENCIES?
The situation is similar but worse for the Python requirement. Excluding
Python 4 by default will cause disruption across the ecosystem when the
core Python team eventually releases a new major version. It’s unlikely that
Python 4 will come anywhere near Python 3 in terms of incompatible
changes. Poetry’s constraint is contagious in the sense that dependent
packages must also introduce it. And it’s impossible for Python package
installers to satisfy—they can’t downgrade the environment to an earlier
version of Python.
[tool.poetry.dependencies]
python = ">=3.10"
rich = ">=13.7.1"
httpx = {version = ">=0.27.0", extras = ["http2"]
[tool.poetry.dependencies]
awesome = {version = ">=1", markers = "implementa
[[package]]
name = "rich"
version = "13.7.1"
python-versions = ">=3.7.0"
dependencies = {markdown-it-py = ">=2.2.0", pygme
files = [
{file = "rich-13.7.1-py3-none-any.whl", hash
{file = "rich-13.7.1.tar.gz", hash = "sha256
]
Use the command poetry show to display the locked dependencies in
the terminal. Here’s what the output looked like after I added Rich:
$ poetry show
markdown-it-py 3.0.0 Python port of markdown-it
mdurl 0.1.2 Markdown URL utilities
pygments 2.17.2 Pygments is a syntax highli
rich 13.7.1 Render rich text, tables, p
Poetry’s lock file is designed to work across operating systems and Python
interpreters. Having a single environment-independent, or “universal”, lock
file is beneficial if your code must run in diverse environments, or if you’re
an open source maintainer with contributors from all over the world.
Updating Dependencies
You can update all dependencies in the lock file to their latest versions
using a single command:
$ poetry update
If you no longer need a package for your project, remove it with poetry
remove :
Managing Environments
Poetry’s add , update , and remove commands don’t just update
dependencies in the pyproject.toml and poetry.lock files. They also
synchronize the project environment with the lock file by installing,
updating, or removing packages. Poetry creates the virtual environment for
the project on demand.
This setting makes the environment discoverable for other tools in the
ecosystem, such as py and uv . Having the directory in the project is
convenient when you need to examine its contents. Although the setting
restricts you to a single environment, this limitation is seldom a concern.
Tools like Nox and tox are tailor-made for testing across multiple
environments (see Chapter 8).
You can check the location of the current environment using the command
poetry env info --path . If you want to create a clean slate for
your project, use the following commands to remove existing environments
and create a new one using the specified Python version:
Before you use the environment, you should install the project. Poetry
performs editable installs, so the environment reflects any code changes
immediately:
$ poetry install
Enter the project environment by launching a shell session with poetry
shell . Poetry activates the virtual environment using the activation script
for your current shell. With the environment activated, you can run the
application from the shell prompt. Just exit the shell session when you’re
done:
$ poetry shell
(random-wikipedia-article-py3.12) $ random-wikipe
(random-wikipedia-article-py3.12) $ exit
You can also run the application in your current shell session, using the
command poetry run :
The command is also handy for starting an interactive Python session in the
project environment:
When you run a program with poetry run , Poetry activates the virtual
environment without launching a shell. This works by adding the
environment to the program’s PATH and VIRTUAL_ENV variables (see
“Activation scripts”).
TIP
Just type py to get a Python session for your Poetry project on Linux and macOS. This requires the
Python Launcher for Unix, and you must configure Poetry to use in-project environments.
Dependency Groups
Poetry allows you to declare development dependencies, organized in
dependency groups. Dependency groups aren’t part of the project metadata
and are invisible to end users. Let’s add the dependency groups from
“Development Dependencies”:
[tool.poetry.group.tests.dependencies]
pytest = "^8.1.1"
pytest-sugar = "^1.0.0"
[tool.poetry.group.docs.dependencies]
sphinx = "^7.2.6"
[tool.poetry.group.docs]
optional = true
[tool.poetry.group.docs.dependencies]
sphinx = "^7.2.6"
WARNING
Don’t specify the --optional flag when you add a dependency group with poetry add —it
doesn’t mark the group as optional. The option designates optional dependencies that are behind an
extra; it has no valid use in the context of dependency groups.
The poetry install command has several options that provide finer-
grained control over which dependencies are installed into the project
environment (Table 5-3).
Table 5-3. Installing dependencies with poetry install
Option Description
NOTE
If you’re following along in this section, please don’t upload the example project to PyPI. Use the
TestPyPI repository instead—it’s a playground for testing, learning, and experimentation.
Before you can upload packages to PyPI, you need an account and an API
token to authenticate with the repository, as explained in “Uploading
Packages with Twine”. Next, add the API token to Poetry:
You can create packages for a Poetry project using standard tooling like
build or with Poetry’s command-line interface:
$ poetry build
Building random-wikipedia-article (0.1.0)
- Building sdist
- Built random_wikipedia_article-0.1.0.tar.gz
- Building wheel
- Built random_wikipedia_article-0.1.0-py3-none
Like build , Poetry places the packages in the dist directory. You can
publish the packages in dist using poetry publish :
$ poetry publish
You can now specify the repository when publishing your project. Feel free
to try this out with your own version of the example project:
The command prompts you for the password and stores it in the system
keyring, if available, or in the auth.toml file on disk.
Poetry also supports repositories that are secured by mutual TLS or use a
custom certificate authority; see the official documentation for details.
Above, you’ve seen how to upload your package to repositories other than
PyPI. Poetry also supports alternate repositories on the consumer side: you
can add packages to your project from sources other than PyPI. While
upload targets are a user setting and stored in the Poetry configuration,
package sources are a project setting, stored in pyproject.toml.
You configure credentials for package sources just like you do for
repositories:
The following command lists the package sources for the project:
WARNING
Specify the source when adding packages from supplemental sources. Otherwise, Poetry searches all
sources when looking up a package. An attacker could upload a malicious package to PyPI with the
same name as your internal package (dependency confusion attack).
If the plugin affects the build stage of your project, add it to the build
dependencies in pyproject.toml, as well. See “The Dynamic Versioning
Plugin” for an example.
If you no longer need the plugin, remove it from the injected packages:
If you’re no longer sure which plugins you have installed, list them like
this:
Poetry’s lock file is great to ensure that everybody on your team, and every
deployment environment, ends up with the same dependencies. But what do
you do if you can’t use Poetry in some context? For example, you may need
to deploy your project on a system that only has a Python interpreter and
the bundled pip.
As of this writing, there’s no lock file standard in the wider Python world;
3
each packaging tool that supports lock files implements its own format.
None of these lock file formats has support in pip. But we do have
requirements files.
Requirements files let you pin packages to an exact version, require their
artifacts to match cryptographic hashes, and use environment markers to
restrict packages to specific Python versions and platforms. Wouldn’t it be
nice if you could generate one from your poetry.lock, for interoperability
with non-Poetry environments? This is precisely what the export plugin
achieves.
Distribute the requirements file to the target system and use pip to install
the dependencies (typically followed by installing a wheel of your project).
In the previous section, you saw how to deploy your project on a system
without Poetry. If you do have Poetry available, you might be wondering:
can you just deploy with poetry install ? You could, but Poetry
performs an editable install of your project—you’ll be running your
application from the source tree. That may not be acceptable in a production
environment. Editable installs also limit your ability to ship the virtual
environment to another destination.
The bundle plugin allows you to deploy your project and locked
dependencies to a virtual environment of your choosing. It creates the
environment, installs the dependencies from the lock file, then builds and
installs a wheel of your project.
4
Test the environment by running the entry-point script for the application.
$ app/bin/random-wikipedia-article
You can use the bundle plugin to create a minimal Docker image for
production. Docker supports multi-stage builds, where the first stage builds
the application in a full-fledged build environment, and the second stage
copies the build artifacts over into a minimal runtime environment. This
allows you to ship slim images to production, speeding up deployments and
reducing bloat and potential vulnerabilities in your production
environments.
In Example 5-7, the first stage installs Poetry and the bundle plugin, copies
the Poetry project, and bundles it into a self-contained virtual environment.
The second stage copies the virtual environment into a minimal Python
image.
FROM gcr.io/distroless/python3-debian12
COPY --from=builder /venv /venv
ENTRYPOINT ["/venv/bin/random-wikipedia-article"]
The first FROM directive introduces the build stage, where you build
and install your project. The base image is a slim variant of the
Debian stable release.
The second FROM directive defines the image that you deploy to
production. The base image is a distroless Python image for Debian
stable: Python language support minus the operating system.
If you have Docker installed, you can try this out. First, create a Dockerfile
in your project with the contents from Example 5-7. Next, build and run the
Docker image:
Install the plugin with pipx and enable it for your project:
[tool.poetry-dynamic-versioning]
enable = true
Remember that you have installed the Poetry plugin globally. The explicit
opt-in ensures that you don’t accidentally start overwriting the version field
in unrelated Poetry projects.
Build frontends like pip and build need the plugin when they build your
project. For this reason, enabling the plugin also adds it to the build
dependencies in pyproject.toml. The plugin brings its own build backend,
which wraps the one provided by Poetry:
[build-system]
requires = ["poetry-core>=1.0.0", "poetry-dynamic
build-backend = "poetry_dynamic_versioning.backen
Poetry still requires the version field in its own section. Set the field to
"0.0.0" to indicate that it’s unused.
[tool.poetry]
version = "0.0.0"
You can now add a Git tag to set your project version:
[tool.poetry-dynamic-versioning]
enable = true
substitution.folders = [{path = "src"}]
import argparse
__version__ = "0.0.0"
def main():
parser = argparse.ArgumentParser(prog="random
parser.add_argument(
"--version", action="version", version=f"
)
parser.parse_args()
...
Before proceeding, commit your changes, but without adding another Git
tag. Let’s try the option in a fresh installation of the project:
$ rm -rf .venv
$ uv venv
$ uv pip install --no-cache .
$ py -m random_wikipedia_article --version
random-wikipedia-article 1.0.0.post1.dev0+51c266e
As you can see, the plugin rewrote the __version__ attribute during
the build. Since you didn’t tag the commit, Dunamai marked the version as
a developmental post-release of 1.0.0 and appended the commit hash
using a local version identifier.
Summary
Poetry provides a unified workflow to manage packaging, dependencies
and environments. Poetry projects are interoperable with standard tooling:
you can build them with build and upload them to PyPI with Twine. But
the Poetry command-line interface also provides convenient shorthands for
these tasks and many more.
Poetry records the precise working set of packages in its lock file, giving
you deterministic deployments and checks, as well as a consistent
experience when collaborating with others. Poetry can track development
dependencies; it organizes them in dependency groups that you can install
separately or together. You can extend Poetry with plugins—for example, to
deploy the project into a virtual environment or to derive the version
number from Git.
1
Sébastien Eustace: “Support for PEP 621,” November 6, 2020.
2
The command also keeps your lock file and project environment up-to-date. If you edit the
constraint in pyproject.toml, you’ll need to do this yourself. Read on to learn more about lock files
and environments.
3
Apart from Poetry’s own poetry.lock and the closely related PDM lock file format, there’s pipenv’s
Pipfile.lock and the conda-lock format for Conda environments.
4
Replace bin with Scripts if you’re on Windows.
Part III. Testing and Static Analysis
Chapter 6. Testing with pytest
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the sixth chapter of the final book. Please note that the GitHub
repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
If you think back to when you wrote your first programs, you may recall a
common experience: You had an idea for how a program could help with a
real-life task, and spent a sizable amount of time coding it from top to
bottom, only to be confronted with screens full of disheartening error
messages when you finally ran it. Or, worse, it gave you results that were
subtly wrong.
There are a few lessons we’ve all learned from experiences like this. One is
to start simple and keep it simple as you iterate on the program. Another
lesson is to test early and repeatedly. Initially, this may just mean to run the
program manually and validate that it does what it should. Later on, if you
break the program into smaller parts, you can test those parts in isolation
and automatically. As a side effect, the program gets easier to read and
work on, too.
In this chapter, I’ll talk about how testing can help you produce value early
and consistently. Good tests amount to an executable specification of the
code you own. They set you free from institutional knowledge in a team or
company, and they speed up your development by giving you immediate
feedback on changes.
NOTE
Pytest originated in the PyPy project, a Python interpreter written in Python. Early on, the PyPy
developers worked on a separate standard library called std , later renamed to py . Its testing
module py.test became an independent project under the name pytest .
Writing a Test
Example 6-1 revisits the Wikipedia example from Chapter 3. The program
is as simple as it gets—yet it’s far from obvious how you’d write tests for it.
The main function has no inputs and no outputs—only side effects, such
as writing to the standard output stream. How would you test a function like
this?
def main():
with urllib.request.urlopen(API_URL) as respo
data = json.load(response)
print(data["title"], end="\n\n")
print(textwrap.fill(data["extract"]))
Let’s write an end-to-end test that runs the program in a subprocess and
checks that it completes with non-empty output. End-to-end tests run the
entire program the way an end user would (Example 6-2).
def test_output():
args = [sys.executable, "-m", "random_wikiped
process = subprocess.run(args, capture_output
assert process.stdout
TIP
Tests written using pytest are functions whose names start with test . Use the built-in assert
statement to check for expected behavior. Pytest rewrites the language construct to provide rich error
reporting in case of a test failure.
random-wikipedia-article
├── pyproject.toml
├── src
│ └── random_wikipedia_article
│ ├── __init__.py
│ └── __main__.py
└── tests
├── __init__.py
└── test_main.py
[project.optional-dependencies]
tests = ["pytest>=8.1.1"]
Please refer to these steps when I ask you to add test dependencies later in
this chapter.
Finally, let’s run the test suite. If you’re on Windows, activate the
environment before you run the following command.
$ py -m pytest
========================= test session starts ===
platform darwin -- Python 3.12.2, pytest-8.1.1, p
rootdir: ...
collected 1 item
tests/test_main.py .
========================== 1 passed in 0.01s ====
TIP
Use py -m pytest even in Poetry projects. It’s both shorter and safer than poetry run
pytest . If you forget to install pytest into the environment, Poetry falls back to your global
environment. (The safe variant would be poetry run python -m pytest .)
Designing for Testability
Writing finer-grained tests for the program is much harder. The API
endpoint returns a random article, so which title and summary should the
tests expect? Every invocation sends an HTTP request to the real Wikipedia
API. Those network roundtrips will make the test suite excruciatingly slow
—and you can only run tests when your machine is connected to the
internet.
The term monkey patch for replacing code at runtime originated at Zope Corporation. Initially, people
at Zope called the technique “guerilla patching”, since it didn’t abide by the usual rules of patch
submission. People heard that as “gorilla patch”—and soon the more carefully crafted ones came to
be known as “monkey patches”.
While these tools serve their purpose, I’d encourage you to focus on the
root of the problem: Example 6-1 has no separation of concerns. A single
function serves as the application entry point, communicates with an
external API, and presents the results on the console. This makes it hard to
test its features in isolation.
Example 6-3 shows a refactoring that makes the code more testable. While
this version of the program is longer, it expresses its logic more clearly and
is more amenable to change. Good tests don’t just catch bugs: they improve
the design of your code.
@dataclass
class Article:
title: str = ""
summary: str = ""
def fetch(url):
with urllib.request.urlopen(url) as response
data = json.load(response)
return Article(data["title"], data["extract"]
def main():
article = fetch(API_URL)
show(article, sys.stdout)
For brevity, examples in this chapter only show imports on first use.
The refactoring extracts fetch and show functions from main . It also
defines an Article class as the common denominator of these functions.
Let’s see how these changes let you test the parts of the program in isolation
and in a repeatable way.
The show function accepts any file-like object. While main passes
sys.stdout , tests can pass an io.StringIO instance to store the
output in memory. Example 6-4 uses this technique to check that the output
ends with a newline. The final newline ensures the output doesn’t run into
the next shell prompt.
import io
from random_wikipedia_article import Article, sho
def test_final_newline():
article = Article("Lorem Ipsum", "Lorem ipsum
file = io.StringIO()
show(article, file)
assert file.getvalue().endswith("\n")
2
2
function to use Rich, as shown in Example 6-5. You won’t need to adapt
your tests!
In fact, the whole point of tests is to give you confidence that your program
still works after making changes like this. Mocks and monkey patches, on
the other hand, are brittle: They tie your test suite to implementation details,
making it increasingly hard to change your program down the road.
@pytest.fixture
def file():
return io.StringIO()
Tests (and fixtures) can use a fixture by including a function parameter with
the same name. When pytest invokes the test function, it passes the return
value of the fixture function. Let’s rewrite Example 6-4 to use the fixture:
def test_final_newline(file):
article = Article("Lorem Ipsum", "Lorem ipsum
show(article, file)
assert file.getvalue().endswith("\n")
WARNING
If you forget to add the parameter file to the test function, you get a confusing error:
'function' object has no attribute 'write' . This happens because the name
3
file now refers to the fixture function in the same module.
If every test used the same article, you’d likely miss some edge cases. For
example, you don’t want your program to crash if an article comes with an
empty title. Example 6-6 runs the test for a number of articles with the
4
@pytest.mark.parametrize decorator.
articles = [
Article(),
Article("test"),
Article("Lorem Ipsum", "Lorem ipsum dolor sit
Article(
"Lorem ipsum dolor sit amet, consectetur
"Nulla mattis volutpat sapien, at dapibus
),
]
@pytest.mark.parametrize("article", articles)
def test_final_newline(article, file):
show(article, file)
assert file.getvalue().endswith("\n")
If you parameterize many tests in the same way, you can create a
parameterized fixture, a fixture with multiple values (Example 6-7). As
before, pytest runs the test once for each article in articles .
So what did you gain here? For one thing, you don’t need to decorate each
test with @pytest.mark.parametrize . There’s another advantage if
your tests aren’t all in the same module: You can place fixtures in a file
named conftest.py and use them across your entire test suite without
imports.
def parametrized_fixture(*params):
return pytest.fixture(params=params)(lambda r
Use the helper to simplify the fixture from Example 6-7. You can also inline
the articles variable from Example 6-6:
import unittest
class TestShow(unittest.TestCase):
def setUp(self):
self.article = Article("Lorem Ipsum", "Lo
self.file = io.StringIO()
def test_final_newline(self):
show(self.article, self.file)
self.assertEqual("\n", self.file.getvalue
In unittest , tests are methods with names that start with test , in a
class derived from unittest.TestCase . The assert* methods let
you check for expected properties. The setUp method prepares the test
environment for each test—the test objects each test uses. In this case,
you’re setting up an Article instance and an output buffer for the
show function.
Run the test suite from the project directory using the command py -m
unittest .
$ py -m unittest
.
-------------------------------------------------
Ran 1 test in 0.000s
OK
The class tightly couples tests and the test environment. As a result,
you can’t reuse test objects as easily as with pytest fixtures.
The framework uses inheritance to provide shared functionality. This
couples tests with the framework plumbing, all in a single namespace
and instance.
Placing tests in a class makes them less readable than a module with
functions.
The assertion methods lack expressivity and generality—every type of
check requires a dedicated method. For example, you have
assertEqual and assertIn , but there’s no
assertStartsWith .
TIP
If you have a test suite written with unittest , there’s no need to rewrite it to start using pytest—
pytest “speaks” unittest , too. Use pytest as a test runner right away and you can rewrite your
test suite incrementally later.
def test_fetch(article):
with serve(article) as url:
assert article == fetch(url)
The serve helper function takes an article and returns a URL for fetching
the article. More precisely, it wraps the URL in a context manager, an
object for use in a with block. This allows serve to clean up after
itself when you exit the with block—by shutting down the server:
@contextmanager
def serve(article):
... # start the server
yield f"http://localhost:{server.server_port}
... # shut down the server
import http.server
import json
import threading
@contextmanager
def serve(article):
@pytest.fixture(scope="session")
def httpserver():
...
That looks more promising, but how do you shut down the server when the
tests are done with it? Up to now, your fixtures have only prepared a test
object and returned it. You can’t run code after a return statement.
However, you can run code after a yield statement—so pytest allows
you to define a fixture as a generator.
A generator fixture prepares a test object, yields it, and cleans up resources
at the end—similar to a context manager. You use it in the same way as an
ordinary fixture that returns its test object. Pytest handles the setup and
teardown phases behind the scenes and calls your test function with the
yielded value.
@pytest.fixture(scope="session")
def httpserver():
class Handler(http.server.BaseHTTPRequestHand
def do_GET(self):
article = self.server.article
data = {"title": article.title, "extr
body = json.dumps(data).encode()
... # as before
There’s still a missing piece: You need to define the serve function. The
function now depends on the httpserver fixture to do its work, so you
can’t just define it at the module level. Let’s move it into the test function
for now (Example 6-11).
Example 6-11. Testing the fetch function (version 2)
Store the article in the server, so the request handler can access it.
@pytest.fixture
def serve(httpserver):
def f(article):
httpserver.article = article
return f"http://localhost:{httpserver.ser
return f
The outer function defines a serve fixture, which depends on
httpserver .
The inner function is the serve function you call in your tests.
Your test isn’t tied to any particular HTTP client library. Example 6-14
5
swaps out the implementation of the fetch function to use HTTPX.
This would have broken any test that used monkey patching—but your test
will still pass!
import httpx
def fetch(url):
fields = metadata("random-wikipedia-article")
headers = {"User-Agent": USER_AGENT.format_ma
@pytest.fixture
def serve(httpserver):
def f(article):
json = {"title": article.title, "extract"
httpserver.expect_request("/").respond_wi
return httpserver.url_for("/")
return f
Example 6-15 configures the server to respond to requests to "/" with the
JSON representation of the article. The plugin offers much flexibility
beyond this use case—for example, you can add custom request handlers or
communicate over HTTPS.
As your test suite grows, you’ll be looking for ways to speed up test runs.
Here’s an easy way: utilize all your CPU cores. The pytest-xdist
plugin spawns a worker process on each processor and distributes tests
randomly across the workers. The randomization also helps detect hidden
dependencies between your tests.
$ py -m pytest -n auto
Instead, the factory-boy library lets you create factories for test
objects. You can generate batches of objects with predictable attributes,
such as by using a sequence number. Alternatively, you can populate
attributes randomly using the faker library.
class ArticleFactory(Factory):
class Meta:
model = Article
title = Faker("sentence")
summary = Faker("paragraph")
article = parametrized_fixture(*ArticleFactory.bu
Use a random sentence for the title and a random paragraph for the
summary.
Other Plugins
TIP
Summary
In this chapter, you’ve learned how to test your Python projects with pytest:
Tests are functions that exercise your code and check for expected
behavior using the assert built-in. Prefix their names—and the
names of the containing modules—with test_ , and pytest will
discover them automatically.
Fixtures are functions or generators that set up and tear down test
objects; declare them with the @pytest.fixture decorator. You
can use a fixture in a test by including a parameter named like the
fixture.
Plugins for pytest can provide useful fixtures, as well as modify test
execution, enhance reporting, and much more.
One of the prime characteristics of good software is that it’s easy to change,
since any piece of code used in the real world must adapt to evolving
requirements and an ever-changing environment. Tests make change easier
in several ways:
This chapter focuses on the tooling side of things, but there’s so much more
to good testing practices. Luckily, other people have written fantastic texts
about this topic. Here are three of my all-time favorites:
If you want to know all about how to test with pytest, read Brian’s book:
Brian Okken, Python Testing with pytest: Simple, Rapid, Effective, and
Scalable, Second Edition (Raleigh: The Pragmatic Bookshelf, 2022).
1
Large packages can have modules with the same name—say, gizmo.foo.registry and
gizmo.bar.registry . Under pytest’s default import mode, test modules must have unique
fully-qualified names—so you must place the test_registry modules in separate
tests.foo and tests.bar packages.
2
Remember to add Rich to your project as described in “Specifying Dependencies for a Project”. If
you use Poetry, refer to “Managing Dependencies”.
3
My reviewer Hynek recommends a technique to avoid this pitfall and get an idiomatic
NameError instead. The trick is to name the fixture explicitly with
@pytest.fixture(name="file") . This lets you use a private name for the function, such as
_file , that doesn’t collide with the parameter.
4
Note the somewhat uncommon spelling variant parametrize instead of parameterize.
5
Remember to add a dependency on httpx[http2] to your project.
6
The cookiecutter-pytest-plugin template gives you a solid project structure for writing your own
plugin.
7
Test double is the umbrella term for the various kinds of objects tests use in lieu of the real objects
used in production code. A good overview is “Mocks Aren’t Stubs” by Martin Fowler, January 2,
2007.
Chapter 7. Measuring Coverage with
Coverage.py
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the seventh chapter of the final book. Please note that the
GitHub repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
How confident in a code change are you when your tests pass?
If you look at tests as a way to detect bugs, you can describe their
sensitivity and specificity.
The sensitivity of your test suite is the probability of a test failure when
there’s a defect in the code. If large parts of the code are untested, or if the
tests don’t check for expected behavior, you have low sensitivity.
The specificity of your tests is the probability that they will pass if the code
is free of defects. If your tests are flaky (they fail intermittently) or brittle
(they fail when you change implementation details), then you have low
specificity. Invariably, people stop paying attention to failing tests. This
chapter isn’t about specificity, though.
There’s a great way to boost the sensitivity of your tests: when you add or
change behavior, write a failing test before the code that makes it pass. If
you do this, your test suite will capture your expectations for the code.
Another effective strategy is to test your software with the various inputs
and environmental constraints that you expect it to encounter in the real
world. Cover the edge cases of a function, like empty lists or negative
numbers. Test common error scenarios, not just the “happy path”.
Code coverage is a measure of the extent by which the test suite exercises
your code. Full coverage doesn’t guarantee high sensitivity: If your tests
cover every line in your code, you can still have bugs. It’s an upper bound,
though. If your code coverage is 80%, then 20% of your code will never
trigger a test failure, no matter how many bugs creep in. It’s also a
quantitative measure amenable to automated tools. These two properties
make coverage a useful proxy for sensitivity.
In short, coverage tools record each line in your code when you run it. After
completion, they report the overall percentage of executed lines with
respect to the entire codebase.
Coverage tools aren’t limited to measuring test coverage. For example, code
coverage lets you find which modules an API endpoint in a large codebase
uses. Or you could use it to determine the extent to which code examples
document your project.
How does coverage measurement work in Python? The interpreter lets you
register a callback—a trace function—using the function
sys.settrace . From that point onwards, the interpreter invokes the
callback whenever it executes a line of code—as well as in some other
situations, like entering or returning from functions or raising exceptions.
Coverage tools register a trace function that records each executed line of
source code in a local database.
THE TRACE MODULE
tests/test_main.py .....................
The command asks trace to count how often each line is executed. It
writes the results to <module>.cover files in the coverage directory,
marking missed lines with the string >>>>>> . It also writes a summary to
the terminal, with coverage percentages for every module.
The summary includes modules from the standard library and third-party
packages. It’s easy to miss the fact that your __main__ module doesn’t
appear at all. If you’re curious why the __init__ module only has 92%
coverage, take a look at the file random_wikipedia_article.__init__.cover.
(Bear with me, we’ll get to those missing lines shortly.)
Using Coverage.py
Coverage.py is a mature and widely used code coverage tool for Python.
Created over two decades ago—predating PyPI and setuptools—and
actively maintained ever since, it has measured coverage on every
interpreter since Python 2.1.
[tool.coverage.run]
source = ["random_wikipedia_article", "tests"]
TIP
Measuring code coverage for your test suite may seem strange—but you should always do it. It alerts
you when tests don’t run and helps you identify unreachable code within them. Treat your tests the
1
same way you would treat any other code.
You can invoke coverage run with a Python script, followed by its
command-line arguments. Alternatively, you can use its -m option with an
importable module. Use the second method—it ensures that you run pytest
from the current environment:
After running this command, you’ll find a file named .coverage in the
current directory. Coverage.py uses it to store the coverage data it gathered
2
during the test run.
[tool.coverage.report]
show_missing = true
$ py -m coverage report
Name Stmts
-------------------------------------------------
src/random_wikipedia_article/__init__.py 26
src/random_wikipedia_article/__main__.py 2
tests/__init__.py 0
tests/test_main.py 33
-------------------------------------------------
TOTAL 61
37 def main():
38 article = fetch(API_URL) # missing
39 show(article, sys.stdout) # missing
This is surprising—the end-to-end test from Example 6-2 runs the entire
program, so all of those lines are definitely being tested. For now, disable
coverage measurements for the __main__ module:
[tool.coverage.run]
omit = ["*/__main__.py"]
If you run both steps again, Coverage.py will report full code coverage.
Let’s make sure you’ll notice any lines that aren’t exercised by your tests.
Configure Coverage.py to fail if the percentage drops below 100% again:
[tool.coverage.report]
fail_under = 100
Branch Coverage
If an article has an empty summary, random-wikipedia-article
prints a trailing blank line (yikes). Those empty summaries are rare, but
they exist, and this should be a quick fix. Example 7-1 modifies show to
print only non-empty summaries.
Curiously, the coverage stays at 100%—even though you didn’t write a test
first.
On the other hand, the tests only exercised one of two code paths through
the function—they never skipped the if body. Coverage.py also supports
branch coverage, which looks at all the transitions between statements in
your code and measures the percentage of those traversed during the tests.
You should always enable it, as it’s more precise than statement coverage:
[tool.coverage.run]
branch = true
Re-run the tests, and you’ll see Coverage.py flag the missing transition
from the if statement on line 34 to the exit of the function:
article = parametrized_fixture(
Article("test"), *ArticleFactory.build_batch
)
def test_trailing_blank_lines(article, file):
show(article, file)
assert not file.getvalue().endswith("\n\n")
Run the tests again—and they fail! Can you spot the bug in Example 7-1?
Empty summaries produce two blank lines: one to separate the title and the
summary, and one from printing the empty summary. You’ve only removed
the second one. Example 7-3 removes the first one as well. Thanks,
Coverage.py!
[project]
requires-python = ">=3.7"
Next, check if your dependencies are compatible with the Python version.
Use uv to compile a separate requirements file for a Python 3.7
environment:
$ uv venv -p 3.7
$ uv pip compile --extra=tests pyproject.toml -o
× No solution found when resolving dependencies
The error indicates that your preferred version of HTTPX has already
dropped Python 3.7. Remove your lower version bound and try again. After
a few similar errors and removing the lower bounds of other packages,
dependency resolution finally succeeds. Restore the lower bounds using the
older versions of these packages.
You’ll also need the backport importlib-metadata (see
“Environment Markers”). Add the following entry to the
project.dependencies field:
Compile the requirements one more time. Finally, update your project
environment:
Parallel Coverage
If you now re-run Coverage.py under Python 3.7, it reports the first branch
of the if statement as missing. This makes sense: your code executes the
else branch and imports the backport instead of the standard library.
First, switch the environment back to Python 3.12 using your original
requirements:
$ uv venv -p 3.12
$ uv pip sync dev-requirements.txt
$ uv pip install -e . --no-deps
With a single file for coverage data, it’s easy to erase data accidentally. If
you forget to pass the --append option, you’ll have to run the tests
again. You could configure Coverage.py to append by default, but that’s
error-prone, too: If you forget to run coverage erase periodically,
you’ll end up with stale data in your report.
There’s a better way to gather coverage across multiple environments.
Coverage.py lets you record coverage data in separate files on each run.
3
Enable this behavior with the parallel setting:
[tool.coverage.run]
parallel = true
Coverage reports are always based on a single data file, even in parallel
mode. You merge the data files using the command coverage
combine . That turns the two-step process from earlier into a three-step
one: coverage run — coverage combine — coverage
report .
Let’s put all of this together. For each Python version, set up the
environment and run the tests, as shown here for Python 3.7:
$ uv venv -p 3.7
$ uv pip sync py37-dev-requirements.txt
$ uv pip install -e . --no-deps
$ py -m coverage run -m pytest
$ bi
$ py -m run coverage combine
Combined data file .coverage.somehost.26719.00190
Combined data file .coverage.somehost.26766.14631
$ py -m coverage report
Measuring in Subprocesses
At the end of “Using Coverage.py”, you had to disable coverage for the
main function and the __main__ module. But the end-to-end test
certainly exercises this code. Let’s remove the # pragma comment and
the omit setting and figure this out.
It turns out you don’t need to. You can place a .pth file in the environment
that calls the function during interpreter startup. This leverages a little-
known Python feature (see “Site Packages”): The interpreter executes lines
in a .pth file if they start with an import statement.
You can find the site-packages directory under lib/python3.x on Linux and
macOS, and under Lib on Windows.
$ export COVERAGE_PROCESS_START=pyproject.toml
Re-run the test suite, combine the data files, and display the coverage
report. Thanks to measuring coverage in the subprocess, the program should
have full coverage again.
NOTE
Measuring coverage in subprocesses only works in parallel mode. Without parallel mode, the main
process overwrites the coverage data from the subprocess, because both use the same data file.
The plugin aims to make everything work out of the box, behind the scenes
—including subprocess coverage. The convenience comes at the price of a
layer of indirection. Running coverage directly provides finer-grained
control.
What Coverage to Aim For
Any coverage percentage below 100% means your tests won’t detect bugs
in some parts of your codebase. If you’re working on a new project, there
isn’t any other meaningful coverage target.
That doesn’t imply you should test every single line of code. Consider a log
statement for debugging a rare situation. The statement may be difficult to
exercise from a test. At the same time, it’s probably low-risk, trivial code.
Writing that test won’t increase your confidence in the code significantly.
Exclude the line from coverage using a pragma comment:
if rare_condition:
print("got rare condition") # pragma: no cov
Don’t exclude code from coverage just because it’s cumbersome to test.
When you start working with a new library or interfacing with a new
system, it usually takes some time to figure out how to test your code. But
often those tests end up detecting bugs that would have gone unnoticed and
caused problems in production.
For example, you may be testing a large function that, among other things,
also connects to the production database. Add an optional parameter that
lets you pass the connection from the outside. Tests can then pass a
connection to an in-memory database instead.
Example 7-4 recaps the Coverage.py settings you’ve used in this chapter.
[tool.coverage.run]
source = ["random_wikipedia_article", "tests"]
branch = true
parallel = true
omit = ["*/__main__.py"] # avoid this if you can
[tool.coverage.report]
show_missing = true
fail_under = 100
Summary
You can measure the extent to which the test suite exercises your project
using Coverage.py. Coverage reports are useful for discovering untested
lines. Branch coverage captures the control flow of your program, instead
of isolated lines of source code. Parallel coverage lets you measure
coverage across multiple environments. You need to combine the data files
before reporting. Measuring coverage in subprocesses requires setting up a
.pth file and an environment variable.
1
Ned Batchelder: “You should include your tests in coverage,” August 11, 2020.
2
Under the hood, the .coverage file is just a SQLite database. Feel free to poke around if you have
the sqlite3 command-line utility ready on your system.
3
The name parallel is somewhat misleading; the setting has nothing to do with parallel
execution.
4
Martin Fowler: “Legacy Seam,” January 4, 2024.
Chapter 8. Automation with Nox
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the eighth chapter of the final book. Please note that the
GitHub repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
When you maintain a Python project, you’re faced with many chores.
Running checks on your code is an important part:
Testing helps you reduce the defect rate of your code (Chapter 6).
Coverage reporting pinpoints untested parts of your code (Chapter 7).
Linters analyze your source code to find ways to improve it
(Chapter 9).
Code formatters lay out the source code in a readable way (Chapter 9).
Type checkers verify the type correctness of your code (Chapter 10).
Other chores include:
You need to build and publish packages for distribution (Chapter 3).
You need to update the dependencies of your project (Chapter 4).
You need to deploy your service (see Example 5-7 in Chapter 5).
You need to build the documentation for your project.
Automating these tasks has many benefits. You focus on coding while the
check suite covers your back. You gain confidence in the steps that take
your code from development to production. You remove human error and
encode each process, so others can review and improve it.
Automation gives you leverage to make each step as repeatable, each result
as reproducible, as possible. Checks and tasks run in the same way on
developer machines and in continuous integration (CI). They run across
different Python versions, operating systems, and platforms.
You write Nox sessions in plain Python: each Nox session is a Python
function that executes commands in a dedicated, isolated environment.
Using Python as the automation language gives Nox great simplicity,
portability, and expressivity.
First Steps with Nox
Install Nox globally using pipx:
import nox
@nox.session
def tests(session):
session.install(".", "pytest")
session.run("pytest")
Sessions are the central concept in Nox: each session comprises an
environment and some commands to run within it. You define a session by
writing a Python function decorated with @nox.session . The function
receives a session object as an argument, which you can use to install
packages ( session.install ) and run commands ( session.run )
in the session environment.
You can try the session with the example project from previous chapters.
For now, add your test dependencies to the session.install
arguments:
$ nox
nox > Running session tests
nox > Creating virtual environment (virtualenv) u
nox > python -m pip install . pytest
nox > pytest
========================= tests session starts ==
...
========================== 21 passed in 0.94s ===
nox > Session tests was successful.
As you can see from the output, Nox starts by creating a virtual
environment for the tests session using virtualenv . If you’re
curious, you can find this environment under the .nox directory in your
project.
NOTE
By default, environments use the same interpreter as Nox itself. In “Working with Multiple Python
Interpreters”, you’ll learn how to run sessions on another interpreter, and even across multiple ones.
First, the session installs the project and pytest into its environment. The
function session.install is just pip install underneath. You
can pass any appropriate options and arguments to pip. For example, you
can install your dependencies from a requirements file:
session.install("-r", "dev-requirements.txt")
session.install(".", "--no-deps")
session.install(".[tests]")
Nox lets you use uv instead of virtualenv and pip for creating
environments and installing packages. You can switch the backend to uv
by setting an environment variable:
$ export NOX_DEFAULT_VENV_BACKEND=uv
Second, the session runs the pytest command you just installed. If a
command fails, the session is marked as failed. By default, Nox continues
with the next session, but it will exit with a non-zero status at the end if any
session failed. In the run above, the test suite passes and Nox reports
success.
Example 8-2 adds a session to build packages for the project (see
Chapter 3). The session also validates the packages using Twine’s check
command.
import shutil
from pathlib import Path
@nox.session
def build(session):
session.install("build", "twine")
distdir = Path("dist")
if distdir.exists():
shutil.rmtree(distdir)
Example 8-2 relies on the standard library for clearing out stale packages
and locating the freshly built ones: Path.glob matches files against
wildcards, and shutil.rmtree removes a directory and its contents.
TIP
Nox doesn’t implicitly run commands in a shell, unlike tools such as make . Shells differ widely
between platforms, so they’d make Nox sessions less portable. For the same reason, avoid Unix
utilities like rm or find in your sessions—use Python’s standard library instead!
@nox.session
def build(session):
session.install("twine")
session.run("poetry", "build", external=True)
session.run("twine", "check", *Path().glob("d
You’re trading off reliability for speed here. Example 8-2 works with any
build backend declared in pyproject.toml and installs it in an isolated
environment on each run. Example 8-4 assumes that contributors have a
recent version of Poetry on their system and breaks if they don’t. Prefer the
first method unless every developer environment has a well-known Poetry
version.
Working with Sessions
Over time, noxfile.py may accumulate a number of sessions. The --list
option gives you a quick overview of them. If you add module and function
docstrings with helpful descriptions, Nox includes them in the listing as
well.
$ nox --list
Run the checks and tasks for this project.
Running Nox with the --session option lets you select individual
sessions by name:
During development, running nox repeatedly lets you catch errors early.
On the other hand, you don’t need to validate your packages each time.
Fortunately, you can change which sessions run by default by setting
nox.options.sessions :
nox.options.sessions = ["tests"]
Now, when you run nox without arguments, only the tests session
runs. You can still select the build session using the --session
option. Command-line options override values specified in
1
nox.options in noxfile.py.
TIP
Keep your default sessions aligned with the mandatory checks for your project. Contributors should
be able to run nox without arguments to check if their code changes are acceptable.
Every time a session runs, Nox creates a fresh virtual environment and
installs the dependencies. This is a good default, because it makes the
checks strict, deterministic, and repeatable. You won’t miss problems with
your code due to stale packages in the session environment.
However, Nox gives you a choice. Setting up environments each time might
be a tad slow if you re-run your tests in quick succession while coding. You
can reuse environments with the option -r or --reuse-existing-
virtualenvs . Additionally, you can skip installation commands by
specifying --no-install , or combine these options using the
shorthand -R .
$ nox -R
nox > Running session tests
nox > Re-using existing virtual environment at .n
nox > pytest
...
nox > Session tests was successful.
Nox creates an environment for each version and runs the commands in
those environments in turn:
$ nox
nox > Running session tests-3.12
nox > Creating virtual environment (virtualenv) u
nox > python -m pip install . pytest
nox > pytest
...
nox > Session tests-3.12 was successful.
nox > Running session tests-3.11
...
nox > Running session tests-3.10
...
nox > Ran multiple sessions:
nox > * tests-3.12: success
nox > * tests-3.11: success
nox > * tests-3.10: success
TIP
Did you get errors from pip when you ran Nox just now? Don’t use the same compiled requirements
file for every Python version. You need to lock dependencies separately for each environment (see
“Session Dependencies”).
You can narrow down sessions by Python version using the option --
python :
Session Arguments
So far, the tests session runs pytest without arguments:
session.run("pytest")
session.run("pytest", "--verbose")
But you don’t always want the same options for pytest. For example, the -
-pdb option launches the Python debugger on test failures. The debug
prompt can be a life saver when you investigate a mysterious bug. But it’s
worse than useless in a CI context: it would hang forever since there’s
nobody to enter commands. Similarly, when you work on a feature, the -k
option lets you run tests with a specific keyword in their name—but you
wouldn’t want to hardcode it in noxfile.py either.
Fortunately, Nox lets you pass additional command-line arguments for a
session. The session can forward the session arguments to a command or
evaluate them for its own purposes. Session arguments are available in the
session as session.posargs . Example 8-6 shows how you forward
them to a command like pytest.
Automating Coverage
Coverage tools give you a sense of how much your tests exercise the
codebase (see Chapter 7). In a nutshell, you install the coverage
package and invoke pytest via coverage run . Example 8-7 shows how
to automate this process with Nox:
Example 8-7. Running tests with code coverage
[tool.coverage.run]
parallel = true
In Chapter 7, you installed your project in editable mode. The Nox session
builds and installs a wheel of your project instead. This ensures that you’re
testing the final artifact you’re distributing to users. But it also means that
Coverage.py needs to map the installed files back to your source tree.
Configure the mapping in pyproject.toml:
[tool.coverage.paths]
source = ["src", "*/site-packages"]
This maps files installed in the site-packages directory in an
environment to files in your src directory. The key source is an
arbitrary identifier; it’s needed because you can have multiple
mappings in this section.
Example 8-8 aggregates the coverage files and displays the coverage report:
@nox.session
def coverage(session):
session.install("coverage[toml]")
if any(Path().glob(".coverage.*")):
session.run("coverage", "combine")
session.run("coverage", "report")
Unlike Example 8-7, this session runs on the default Python version and
installs only Coverage.py. You don’t need to install your project to generate
the coverage report.
If you run these sessions on the example project, make sure to configure
Coverage.py as shown in Chapter 7. Include Python 3.7 in the tests
session if your project uses the conditional import for importlib-
metadata .
The coverage session still reports missing coverage for the main function and the __main__
module. You’ll take care of that in “Automating Coverage in Subprocesses”.
Session Notification
As it stands, this noxfile.py has a subtle problem. Until you run the
coverage session, your project will be littered with data files waiting to
be processed. And if you haven’t run the tests session recently, the data
in those files may be stale—so your coverage report won’t reflect the latest
state of the codebase.
Example 8-9 triggers the coverage session to run automatically after the
test suite. Nox supports this with the session.notify method. If the
notified session isn’t already selected, it runs after the other sessions have
completed.
sysconfig.get_path("purelib")
If you called the function directly in your session, it would return a location
in the environment where you’ve installed Nox. Instead, you need to query
the interpreter in the session environment. You can do this by running
python with session.run :
output = session.run(
"python",
"-c",
"import sysconfig; print(sysconfig.get_path(
silent=True,
)
The silent keyword lets you capture the output instead of echoing it to
the terminal. Thanks to pathlib from the standard library, writing the
.pth file now only takes a couple of statements:
purelib = Path(output.strip())
(purelib / "_coverage.pth").write_text(
"import coverage; coverage.process_startup()"
)
Example 8-10 extracts these statements into a helper function. The function
takes a session argument, but it isn’t a Nox session—it lacks the
@nox.session decorator. In other words, the function won’t run unless
you call it from a session.
def install_coverage_pth(session):
output = session.run(...) # see above
purelib = Path(output.strip())
(purelib / "_coverage.pth").write_text(...)
You’re almost done. What’s left is invoking the helper function from the
tests session and passing the environment variable to coverage .
Example 8-11 shows the final session.
session.install(".[tests]")
install coverage pth(session)
_ _
try:
args = ["coverage", "run", "-m", "pytest"
session.run(*args, env={"COVERAGE_PROCESS
finally:
session.notify("coverage")
Install the dependencies before the .pth file. The order matters
because the .pth file imports the coverage package.
With subprocess coverage enabled, the end-to-end test produces the missing
coverage data for the main function and the __main__ module. Invoke
nox and watch it run your tests and generate a coverage report. Here’s
what the report should look like:
-------------------------------------------------
src/.../__init__.py 29 0 8 0
src/.../__main__.py 2 0 0 0
tests/__init__.py 0 0 0 0
tests/test_main.py 36 0 6 0
-------------------------------------------------
TOTAL 67 0 14 0
nox > Session coverage was successful.
Parameterizing Sessions
The phrase “works for me” describes a common story: a user reports an
issue with your code, but you can’t reproduce the bug in your environment.
Runtime environments in the real world differ in a myriad of ways. Testing
across Python versions covers one important variable. Another common
cause of surprise is the packages that your project uses directly or indirectly
—its dependency tree.
Nox offers a powerful technique for testing your project against different
versions of a dependency. Parameterization allows you to add parameters
to your session functions and supply predefined values for them; Nox runs
the session with each of these values.
@nox.session
@ t i ("dj " ["5 *" "4 *" "3 *"])
@nox.parametrize("django", ["5.*", "4.*", "3.*"])
def tests(session, django):
session.install(".", "pytest-django", f"djang
session.run("pytest")
@nox.session
@nox.parametrize("a", ["1.0", "0.9"])
@nox.parametrize("b", ["2.2", "2.1"])
def tests(session, a, b):
print(a, b) # all combinations of a and b
If you only want to check for certain combinations, you can combine
parameters in a single @nox.parametrize decorator:
@nox.session
@nox.parametrize(["a", "b"], [("1.0", "2.2"), ("0
def tests(session, a, b):
print(a, b) # only the combinations listed a
When running a session across Python versions, you’re effectively
parameterizing the session by the interpreter. In fact, Nox let’s you write the
4
following instead of passing the versions to @nox.session :
@nox.session
@nox.parametrize("python", ["3.12", "3.11", "3.10
def tests(session):
...
This syntax is useful when you want specific combinations of Python and
the dependency. Here’s an example: As of this writing, Django 3.2 (LTE)
doesn’t officially support Python versions newer than 3.10. Consequently,
you need to exclude these combinations from the test matrix. Example 8-13
shows how.
@nox.session
@nox.parametrize(
["python", "django"],
[
(python, django)
Session Dependencies
If you’ve followed Chapter 4 closely, you may see some problems with the
way Example 8-8 and Example 8-11 install packages. Here are the relevant
parts again:
@nox.session
def tests(session):
session.install(".[tests]")
...
@nox.session
def coverage(session):
session.install("coverage[toml]")
...
Running checks without locking dependencies has two drawbacks. First, the
checks aren’t deterministic: subsequent runs of the same session may install
different packages. Second, if a dependency breaks your project, checks fail
5
until you exclude the release or another release fixes the problem. In other
words, any project you depend on, even indirectly, has the power to block
your entire CI pipeline.
On the other hand, lock file updates are a constant churn, and they clutter
your Git history. Reducing their frequency comes at the price of running
checks with stale dependencies. If you don’t require locking for other
reasons, such as secure deployments—and you’re happy to quickly fix a
build when an incompatible release wreaks havoc on your CI—you may
prefer to keep your dependencies unlocked. There ain’t no such thing as a
free lunch.
Constraints files look similar to requirements files: Each line lists a package
with a version specifier. Unlike requirements files, however, constraints
files don’t cause pip to install a package—they only control which version
pip selects if it needs to install the package.
A constraints file works great for locking session dependencies. You can
share it across sessions while only installing the packages each session
needs. Its only drawback, compared to using a set of requirements files, is
that you need to resolve all dependencies together, so there’s a higher
chance of dependency conflicts.
@nox.session(venv_backend="uv")
def lock(session):
session.run(
"uv",
"pip",
"compile",
"pyproject.toml",
"--upgrade",
"--quiet",
"--all-extras",
"--output-file=constraints.txt",
)
TIP
Don’t forget to commit the constraints file to source control. You need to share this file with every
contributor, and it needs to be available in CI.
@nox.session
def coverage(session):
session.install("-c", "constraints.txt", "cov
...
def constraints(session):
filename = f"python{session.python}-{sys.plat
return Path("constraints") / filename
Example 8-17 updates the lock session to generate the constraints files.
The session now runs on every Python version. It uses the helper function
to build the path for the constraints file, ensures that the target directory
exists, and passes the filename to uv.
The tests and coverage sessions can now reference the appropriate
constraints file for each Python version. For this to work, you have to
declare a Python version for the coverage session as well.
@nox.session(python="3.12")
def coverage(session):
session.install("-c", constraints(session), "
...
Before I show you how to use Poetry in Nox sessions, let me call out a
couple of differences between Poetry environments and Nox environments.
There’s no right and wrong here. Poetry environments are perfect for ad-
hoc interactions with your project during development, with every tool just
a poetry run away. Nox environments, on the other hand, are
optimized for reliable and repeatable checks; they aim to be as isolated and
deterministic as possible.
When you use Poetry in a Nox session, it’s good to be mindful of these
differences. I recommend these guidelines for invoking poetry
install with Nox:
session.run_install(
"poetry",
"install",
"--no-root",
"--sync",
f"--only={','.join(groups)}",
external=True,
)
if root:
session.install(".")
[tool.poetry.group.coverage.dependencies]
coverage = {extras = ["toml"], version = ">=7.4.4
[tool.poetry.group.tests.dependencies]
pytest = ">=8.1.1"
And here’s what the coverage session looks like with the helper
function:
@nox.session
def coverage(session):
install(session, groups=["coverage"], root=Fa
...
TIP
How does Poetry know to use a Nox environment instead of the Poetry environment? Poetry installs
packages into the active environment, if one exists. When Nox runs Poetry, it activates the session
environment by exporting the VIRTUAL_ENV environment variable (see “Virtual Environments”).
Summary
Nox lets you automate checks and tasks for a project. Its Python
configuration file noxfile.py organizes them into one or more sessions.
Sessions are functions decorated with @nox.session . They receive a
single argument session providing the session API (Table 8-1). Every
session runs in an isolated virtual environment. If you pass a list of Python
versions to @nox.session , Nox runs the session across all of them.
There’s a lot more to Nox that this chapter didn’t cover. For example, you
can use Conda or Mamba to create environments and install packages. You
can organize sessions using keywords and tags, and assign friendly
identifiers using nox.param . Last but not least, Nox comes with a
GitHub Action that makes it easy to run Nox sessions in CI. Take a look at
the official documentation to learn more.
1
In case you’re wondering, always use the plural form nox.options.sessions in noxfile.py.
On the command line, both --session and --sessions work. You can specify any number
of sessions with these options.
2
Alan Kay, “What is the story behind Alan Kay’s adage Simple things should be simple, complex
things should be possible?”, Quora Answer, June 19, 2020.
3
Like pytest, Nox uses the alternate spelling “parametrize” to protect your “E” keycap from
excessive wear.
4
The eagle-eyed reader may notice that python is not a function parameter here. If you do need it
in the session function, use session.python instead.
5
Semantic Versioning constraints do more harm here than they help. Bugs occur in all releases, and
your upstream’s definition of a breaking change may be narrower than you like. See Hynek
Schlawack: “Semantic Versioning Will Not Save You,” March 2, 2021.
6
Run poetry lock --no-update after editing pyproject.toml to update the poetry.lock file.
Chapter 9. Linting with Ruff and pre-
commit
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the ninth chapter of the final book. Please note that the GitHub
repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
Linters don’t run a program to discover issues with it; they read and analyze
its source code. This process is known as static analysis, as opposed to
runtime (or dynamic) analysis. It makes linters both fast and safe—you
needn’t worry about side effects, such as requests to production systems.
Static checks can be smart and also fairly complete—you needn’t hit the
right combination of edge cases to dig up a latent bug.
NOTE
Static analysis is powerful, but you should still write tests for your programs. Where static checks use
deduction, tests use observation. Linters verify a limited set of generic code properties, while tests
can validate that a program satisfies its requirements.
Linters are also great at enforcing a readable and consistent style, with a
preference for idiomatic and modern constructs over obscure and
deprecated syntax. Organizations have adopted style guides for years, such
as the recommendations in PEP 8 or the Google Style Guide for Python.
Linters can function as executable style guides: by flagging offending
constructs automatically, they keep code review focused on the meaning of
a change, rather than stylistic nitpicks.
But first, let’s look at a typical problem that linters help you solve.
Linting Basics
The constructs flagged by linters may not be outright illegal. More often,
they just trigger your spider sense that something might be wrong. Consider
the Python code in Example 9-1.
import subprocess
If you haven’t been bitten by this bug before, you may be surprised to find
that the function sometimes passes --force to the command when it
shouldn’t:
Linters can detect pitfalls like this, warn you about them, and even fix them
for you. Let’s use a linter named Ruff on the function—you’ll hear a lot
more about it in this chapter. For now, just take note of its error message,
1
which identifies the bug:
Two tools have dominated linting for much of Python’s history: Pylint and
Flake8.
In recent years, the Python code quality ecosystem has seen significant
changes, spearheaded by three tools: Black, mypy, and Ruff.
The rise of the code formatter Black has made countless stylistic
checks unnecessary. It’s named after Henry Ford’s adage, “any color
the customer wants, as long as it’s black.” Black uncompromisingly
applies the same code style to every project that adopts it.
Type checkers like mypy have taken over the large realm of type-
related linter checks—for example, detecting function calls with the
wrong type of argument. I’ll discuss type checking in Chapter 10.
Ruff re-implements an entire ecosystem of Python code-quality tooling
in Rust, including Flake8, Pylint, and Black. Besides offering a unified
2
experience, Ruff speeds up checks by up to two orders of magnitude.
Historically, linters only emitted warnings about offending code, leaving the
thankless task of improving it to humans. Modern linters can fix many
violations automatically, including sophisticated tasks such as refactoring a
Python codebase to use modern language features.
Astral, the company behind Ruff, also created the Python packaging tool uv
(see “Managing Environments with uv”), and they’ve assumed the
stewardship of Rye, a Python project manager (see “Managing Packages
with Rye”). All of these tools are implemented in Rust.
TIP
If you manage your project with Rye, the Ruff linter and code formatter is available under the
commands rye lint and rye fmt , respectively.
But wait—how can pipx install a Rust program? The Ruff binary is
available as a wheel on PyPI, so Python folks like you and me can install it
with good old pip and pipx. You could even run it with py -m ruff .
When you refactor f-strings, it’s easy to leave the f prefix behind after
removing placeholders. Ruff flags f-strings without placeholders—they’re
noisy, they confuse readers, and somebody might have forgotten to include
a placeholder.
Run the command ruff check —the front-end for Ruff’s linter. Without
arguments, the command lints every Python file under your current
directory, unless it’s listed in a .gitignore file:
$ ruff check
example.py:2:12: F541 [*] f-string without any pl
Found 1 error.
[*] 1 fixable with the `--fix` option.
Ruff tells you where the violation occurred—the file, line, and line offset—
and gives you a short summary of what went wrong: f-string without any
placeholders. Two interesting bits are sandwiched between the location and
the summary: An alphanumeric code ( F541 ) identifies the linter rule, and
the sign [*] indicates that Ruff can automatically fix the issue.
If you’re ever confused why you’re getting a warning, you can ask Ruff to
explain it using the command ruff rule :
What it does
Checks for f-strings that do not contain any plac
Rule codes have a prefix of one or more letters, followed by three or more
digits. The prefix identifies a specific linter—for example, the F in F541
stands for the Pyflakes linter. Ruff re-implements many more Python code-
quality tools—as of this writing, it ships over 50 built-in plugins modeled
after existing tools. You can find out which linters are available using the
command ruff linter :
$ ruff linter
F Pyflakes
E/W pycodestyle
C90 mccabe
I isort
N pep8-naming
D pydocstyle
UP pyupgrade
... (50+ more lines)
You can activate linters and individual rules for your project in its
pyproject.toml file. The setting tool.ruff.lint.select enables
any rules whose code starts with one of the given prefixes. Out of the box,
Ruff enables some basic all-around checks from Pyflakes and Pycodestyle:
[tool.ruff.lint]
select = ["E4", "E7", "E9", "F"]
TIP
If you aren’t using an opinionated code formatter, consider enabling the entire E and W blocks.
Their automatic fixes help ensure minimal PEP 8 compliance. They’re similar to, but not yet as
3
feature-complete as, the Autopep8 formatter (see “Approaches to Code Formatting: Autopep8”).
Ruff has too many rules to describe in this book, and more are being added
all the time. How do you find the good ones for your project? Try them out!
Depending on your project, you may want to enable individual rules
( "B006" ), groups of rules ( "E4" ), entire plugins ( "B" ), or even every
existing plugin at the same time ( "ALL" ).
WARNING
Reserve the special ALL code for experimentation: it will implicitly enable new linters whenever
you upgrade Ruff. Beware: some plugins require configuration to produce useful results, and some
4
rules conflict with other rules.
[tool.ruff.lint]
select = ["E", "W", "F", "B006"]
If you’re unsure where to start, Table 9-1 describes a dozen built-in plugins
to try.
Table 9-1. A dozen widely useful Ruff plugins
When onboarding legacy projects to Ruff, your first task will be to decide
which linters provide the most useful feedback. At this stage, individual
diagnostics can be quite overwhelming. It helps to zoom out using the --
statistics option:
At this point, you have two options. First, if a linter is particularly noisy,
hide it from the output using the --ignore option. For example, if
you’re not ready to add type annotations and docstrings, exclude flake8-
annotations and pydocstyle with --ignore ANN,D . Second,
if you see a linter with interesting findings, enable it permanently in
pyproject.toml and fix its warnings. Rinse and repeat.
TIP
Work towards enforcing the same set of linters for all your projects, with the same configurations,
and prefer default configurations over customizations. This will make your codebase more consistent
and accessible across the entire organization.
The select setting is flexible, but purely additive: it lets you opt into
rules whose code starts with a given prefix. The ignore setting lets you
fine-tune in the other direction: it disables individual rules and rule groups.
Like select , it matches rule codes by their prefixes.
The subtractive method is handy when you need most, but not all, of a
linter’s rules, and when you’re adopting a linter gradually. The
pydocstyle plugin ( D ) checks that every module, class, and function
has a well-formed docstring. Your project may be almost there, with the
exception of module docstrings ( D100 ). Use the ignore setting to
disable all warnings about missing module docstrings until you’ve fully
onboarded your project:
[tool.ruff.lint]
select = ["D", "E", "F"]
ignore = ["D100"] # Don't require module docstri
The per-file-ignore setting lets you disable rules for a part of your
codebase. Here’s another example: The bandit plugin ( S ) has a rich
inventory of checks to help you detect security vulnerabilities in your code.
5
Its rule S101 flags every use of the assert keyword. But you still
need assert to express expectations in pytest (see Chapter 6). If your
test suite lives in a tests directory, disable S101 for its files like this:
[tool.ruff.lint.per-file-ignores]
"tests/*" = ["S101"] # Tests can use assertions
Always include rule codes in your noqa comments. Blanket noqa comments can hide unrelated
issues. Marking violations also makes them easier to find when you’re ready to fix them. Use the rule
PGH004 from the pygrep-hooks linter to require rule codes.
The noqa system lets you silence false positives as well as legitimate
warnings that you choose not to prioritize at this point in time. For example,
the MD5 message-digest algorithm is generally agreed to be insecure, and
Bandit’s S324 flags its uses. But if your code interacts with a legacy
system that requires you to compute an MD5 hash, you may not have much
of a choice. Disable the warning with a noqa comment:
md5 = hashlib.md5(text.encode()).hexdigest() # n
Bandit’s checks often flag constructs that deserve close scrutiny, without
meaning to outright ban them. The idea is that you will vet the offending
lines one by one and suppress the warning if you determine the specific
usage to be innocuous.
It can be reasonable to enable a rule and suppress all of its warnings. This
lets you enforce a rule going forward only—that is, only when you touch a
region of code. Ruff supports this workflow with the --add-noqa
option, which inserts noqa comments to all offending lines on your
behalf:
$ ruff check --add-noqa
Here’s a Nox session that runs Ruff on every Python file in the current
directory:
@nox.session
def lint(session):
session.install("ruff")
session.run("ruff", "check")
Nox is a valid choice here, but when it comes to linting, there’s a more
convenient and powerful alternative: pre-commit, a cross-language linter
framework with Git integration.
Let’s add a pre-commit hook for Ruff to your project. Create a file named
.pre-commit-config.yaml in the top-level directory, with contents as in
Example 9-3. You’ll find a short YAML fragment like this in the public
documentation of most linters that support pre-commit.
Authors distribute their pre-commit hooks via Git repositories. In the .pre-
commit-config.yaml file, you specify the URL, revision, and hooks for each
repository you want to use. The URL can be any location Git can clone
from. The revision is most commonly a Git tag pointing to the latest release
of the linter. A repository can have more than one hook—for example, Ruff
provides ruff and ruff-format hooks for its linter and code
formatter, respectively.
Pre-commit is intimately tied to Git, and you must invoke it from within a
Git repository. Let’s establish a baseline by linting every file in the
repository, using the command pre-commit run with the --all-
files option:
A Hook Up Close
If you’re curious how a pre-commit hook works under the hood, take a peek
at Ruff’s hook repository. The file .pre-commit-hooks.yaml in the repository
defines the hooks. Example 9-4 shows an excerpt from the file.
- id: ruff
name: ruff
language: python
entry: ruff check --force-exclude
args: []
types_or: [python, pyi]
Every hook comes with a unique identifier and a friendly name ( id and
name ). Refer to hooks by their unique identifier when you interact with
pre-commit. Their names only appear in console messages from the tool.
The hook definition tells pre-commit how to install and run the linter by
specifying its implementation language ( language ) and its command
and command-line arguments ( entry and args ). The Ruff hook is a
Python package, so it specifies Python as the language. The --force-
exclude option ensures that you can exclude files from linting. It tells
Ruff to honor its exclude setting even when pre-commit passes
excluded source files explicitly.
TIP
You can override the args key in your .pre-commit-config.yaml file to pass custom command-line
options to a hook. By contrast, command-line arguments in the entry key are mandatory—you
can’t override them.
Finally, the hook declares which file types the linter understands
( types_or ). The python file type matches files with .py or related
extensions and executable scripts with a Python shebang. The pyi file
type refers to stub files with type annotations (see “Distributing Types with
Python Packages”).
Figure 9-2. Three projects with pre-commit hooks for Ruff, Black, and Flake8.
Automatic Fixes
Modern linters can fix many violations by modifying the offending source
files in place. Linters with automatic fixes eliminate entire classes of bugs
6
and code smells at nearly zero cost. Like code formatters, they have caused
a paradigm shift in software development, letting you focus on higher-level
concerns without compromising on code quality.
WARNING
Automatic fixes bring tremendous benefits, but they assume some basic Git hygiene: Don’t pile up
uncommitted changes in your repository (or stash them before linting). Pre-commit saves and
restores your local modifications in some contexts, but not all.
Let’s try this out. When Ruff detects the mutable argument default, it
indicates that you can enable a “hidden” fix. (Ruff asks you to opt into the
fix because people might conceivably depend on mutable defaults, say, for
caching.) First, enable the linter rule and the fix in pyproject.toml:
[tool.ruff.lint]
extend-select = ["B006"]
extend-safe-fixes = ["B006"]
Ruff’s pre-commit hook requires you to opt in with the --fix option, as
shown in Example 9-5. The options --show-fixes and --exit-
non-zero-on-fix ensure that all violations are displayed in the
terminal and result in a non-zero exit status, even if Ruff was able to fix
them.
repos:
- repo: https://github.com/astral-sh/ruff-pre-c
rev: v0.3.4
hooks:
- id: ruff
args: ["--fix", "--show-fixes", "--exit-n
Save Example 9-1 in a file called bad.py, commit the file, and run pre-
commit:
Fixed 1 error:
- bad.py:
1 × B006 (mutable-argument-default)
If you inspect the modified file, you’ll see that Ruff has replaced the
argument default with None . The empty list is now assigned inside the
function, giving every call its own instance of args .
Instead of inspecting the modified files, you can also run git diff to
see the changes applied to your code. Alternatively, you can tell pre-commit
to show you a diff of the fixes right away, using the option --show-
diff-on-fail .
@nox.session
def lint(session):
options = ["--all-files", "--show-diff-on-fai
session.install("pre-commit")
session.run("pre-commit", "run", *options, *s
By default, pre-commit runs every hook you’ve configured for your project.
You can run specific hooks by passing them as additional command-line
arguments. This comes in handy when addressing complaints from a
specific linter. Thanks to session.posargs (see “Session
Arguments”), this also works from Nox:
$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
This command installs a short wrapper script into the .git/hooks directory
that transfers control to pre-commit (Figure 9-3). Programs in the .git/hooks
directory are known as Git hooks. When you run git commit , Git
invokes the pre-commit Git hook. The hook, in turn, invokes pre-commit,
which runs Ruff and any other pre-commit hooks you have.
Git hooks let you trigger actions at predefined points during Git’s
execution. For example, the pre-commit and post-commit Git hooks run
before and after Git creates a commit. You’ve probably guessed by now
which of these Git hooks pre-commit installs by default—but it supports the
other Git hooks as well, if you need them.
PRE-COMMIT HOOKS THAT AREN’T PRE-COMMIT HOOKS
Most pre-commit hooks plug into the pre-commit Git hook—but not all.
Linters for commit messages like commitlint and gitlint use the
commit-msg Git hook. Git calls this hook after you’ve composed a commit
message. You can install it using the --hook-type or -t option:
But passing this option is easy to forget. If you use Git hooks other than
pre-commit, list them in the .pre-commit-config.yaml file instead:
Figure 9-4 depicts a typical workflow with pre-commit. On the left, there’s
a file you’re editing in your project (worktree); the center represents the
staging area for the next commit (index); and the current commit is on the
right ( HEAD ).
Figure 9-4. Workflow with pre-commit
Initially, the three areas are in sync. Suppose you remove the placeholder
from the f-string, but forget to remove the f prefix from the string literal
(marked as 1 in Figure 9-4). You stage your edit using git add (2) and
run git commit to create a commit (3a). Before your editor can pop up
for the commit message, Git transfers control to pre-commit. Ruff promptly
catches your mistake and fixes the string literal in your worktree (3b).
At this point, all three areas have different contents. Your worktree contains
your change with Ruff’s fix, the staging area has your change without the
fix, and HEAD still points to the commit before your change. This lets you
audit the fix by comparing the worktree to the staging area, using git
diff . If you’re happy with what you see, you can stage the fix with git
add (4) and retry the commit with git commit (5).
With automatic fixes, this workflow reduces the interference of linters to a
minimum, rerunning the commit. But sometimes you don’t want to be
distracted by linters at all—for example, you may want to record some
work in progress. Git and pre-commit give you two options to get your
commit past a stubborn linter. First, you can skip Git hooks entirely using
the --no-verify or -n option:
$ git commit -n
Alternatively, you can skip a specific pre-commit hook using the SKIP
environment variable (which also takes a comma-separated list, if you need
to skip more than one hook):
Git hooks control which changes enter your local repository, but they’re
voluntary—they don’t replace CI checks as a gatekeeper for the default
branch in your shared repository. If you already run Nox in CI, the session
in Example 9-6 takes care of that.
Skipping hooks doesn’t help with false positives or when you want to
deploy a critical fix despite minor nits: your mandatory checks would still
fail in CI. In these cases, you’ll need to advise the linter to ignore the
specific violation (see “Disabling Rules and Warnings”).
The Ruff Formatter
Over the course of months, Ruff reimplemented a plethora of Python linters
behind the ruff check command and saw wide adoption in the Python
8
world. A bit over a year in, Ruff acquired the ruff format command.
The Ruff formatter reimplements the de-facto standard of Python code
formatting, Black, in Rust. It provides yet another building block for the
integrated and highly performant toolchain that Ruff has become for
Python.
Many years ago, I was working in a small team on a C++ codebase. Our
processes were simple: everybody committed directly to the main branch.
There was no CI pipeline—just nightly builds and a dashboard showing
compiler errors, warnings, and test failures. Every now and then, one of us
went over recent changes with refactorings and style cleanups. It may sound
surprising, but the codebase was in good shape. Nothing beats mutual
understanding in a close-knit team.
A few years down the road, cracks appeared in our workflow. The company
had grown—new engineers were unfamiliar with the coding conventions,
senior engineers struggled to communicate them. On top of this, the team
had inherited a legacy codebase that didn’t adhere to any discernible rules at
all.
def create_frobnicator_factory(the_factory_name,
interval_in_sec
use_singleton=Non
if dbg:print('creating frobnication factory '+t
if(use_singleton): return _frob_sngltn #
def create_frobnicator_factory(the_factory_name,
interval_in_secs=1
use_singleton=None
if dbg:
print('creating frobnication factory '+th
if (use_singleton):
return _frob_sngltn # we're done
return FrobnicationFactory(the_factory_name,
intrvl=interval_in
You’ll likely find this easier on the eye. For better or worse, Autopep8
didn’t touch some other questionable stylistic choices, such as the rogue
blank line in the return statement and the inconsistent quote characters.
Autopep8 uses Pycodestyle to detect issues, and Pycodestyle had no
complaint here.
TIP
Unlike most code formatters, Autopep8 lets you apply selected fixes by passing --select with
appropriate rule codes. For example, you can run autopep8 --select=E111 to enforce four-
space indentation.
Developed at Google in 2015, the YAPF formatter borrows its design and
sophisticated formatting algorithm from clang-format . The name
9
YAPF stands for “Yet Another Python Formatter.” YAPF reformats a
codebase according to a wealth of configuration options.
In 2018, a new code formatter named Black entered the scene. Its core
principle: minimal configurability!
def create_frobnicator_factory(
the_factory_name,
interval_in_secs=100,
dbg=False,
use_singleton=None,
frobnicate_factor=4.5,
):
if dbg:
print("creating frobnication factory " +
if use_singleton:
return _frob_sngltn # we're done
return FrobnicationFactory(
the_factory_name, intrvl=interval_in_secs
)
Black doesn’t fix individual style offenses like Autopep8, nor does it
enforce your style preferences like YAPF. Rather, Black reduces the source
code into a canonical form, using a deterministic algorithm—mostly
without taking existing formatting into account. In a certain sense, Black
makes code style “disappear.” This normalization massively reduces the
cognitive overhead of working with Python code.
Black took the Python world by storm, with project after project deciding to
“blacken” their source files.
ONBOARDING A CODEBASE TO BLACK
First, why give up conscious style choices and carefully handcrafted code?
You’ll have to make this decision yourself—but consider the following
points: What’s the ongoing cost of enforcing your style without the help of
automated tools? Do your code reviews focus on the meaning of a change
rather than coding style? How long does it take new engineers to get up to
speed?
Second, are the changes safe? Black guarantees that the abstract syntax tree
(AST) of the source code—that is, the parsed representation of the program,
as seen by the interpreter—doesn’t change, except for some well-known
10
divergences that preserve semantic equivalence.
Third, when you commit the changes, how do you prevent them from
cluttering up the output of git blame ? It turns out that you can
configure Git to ignore the commit when annotating files. Store the full 40-
character commit hash in a file named .git-blame-ignore-revs in the root of
the repository. Then run the following command:
Black’s code style becomes invisible once you’ve worked with it for a
while. Inevitably, though, some of its choices have led to controversy, even
forks. To understand its formatting rules, it helps to look at Black’s goal of
producing readable and consistent source code in a predictable and
repeatable way.
Take the default of double quotes for string literals, for example. According
to the style recommendations of PEP 257 and PEP 8, both docstrings and
English text with apostrophes already require double quotes. Choosing
double quotes over single quotes therefore results in a more consistent style
overall.
NOTE
Reducing dependencies between edits helps different people work on the same code. But it also lets
you separate or reorder drive-by bugfixes or refactorings and back out tentative commits before
submitting your changes for code review.
Black takes some cues from the formatted source code besides comments.
One example is the blank lines that divide a function body into logical
partitions. Likely the most powerful way of affecting Black’s output,
however, is the magic trailing comma: if a sequence contains a trailing
comma, Black splits its elements across multiple lines, even if they would
fit on a single line.
Black provides an escape hatch to let you disable formatting for a region of
code (Example 9-7).
@pytest.mark.parametrize(
("value", "expected"),
# fmt: off
[
("first test value", "61df19525cf97
("another test value", "5768979c48c30
("and here's another one", "e766977069039
]
# fmt: on
)
def test_frobnicate(value, expected):
assert expected == frobnicate(value)
Hand-formatting can be useful for program data, such as large tables with
properly aligned columns.
Ruff aims for full compatibility with the Black code style. Unlike Black,
Ruff lets you opt into single quotes and indentation using tabs. However,
I’d recommend adhering to Black’s widely adopted style nonetheless.
When run without arguments, ruff format processes any Python files
beneath the current directory. Instead of invoking Ruff manually, add it to
your pre-commit hooks, as shown in Example 9-8.
repos:
- repo: https://github.com/astral-sh/ruff-pre-c
rev: v0.3.4
hooks:
- id: ruff
Summary
In this chapter, you’ve seen how to improve and preserve the code quality
in your projects using linters and code formatters. Ruff is an efficient
reimplementation of many Python code-quality tools in Rust, including
Flake8 and Black. While it’s possible to run Ruff and other tools manually,
you should automate this process and include it as a mandatory check in CI.
One of the best options is pre-commit, a cross-language linter framework
with Git integration. Invoke pre-commit from a Nox session to keep a
single entry point to your suite of checks.
1
The B short code activates a group of checks pioneered by flake8-bugbear , a plugin for the
Flake8 linter.
2
Charlie Marsh: “Python tooling could be much, much faster”, August 30, 2022.
3
As of this writing, you’ll also need to enable Ruff’s preview mode. Set
tool.ruff.lint.preview to true .
4
My reviewer Hynek disagrees. He sets his projects to ALL and opts out of rules that don’t apply to
him. “Otherwise, you’ll miss new rules. If something starts failing after an update, you can take
action.”
5
What’s wrong with assertions? Nothing, but Python skips them when run with -O for
optimizations—a common way to speed up production environments. So don’t use assert to
validate untrusted input!
6
Kent Beck and Martin Fowler describe code smells as “certain structures in the code that suggest—
sometimes, scream for—the possibility of refactoring.” Martin Fowler: Refactoring: Improving the
Design of Existing Code, Second Edition, Boston: Addison-Wesley, 2019.
7
Running pre-commit from Git is the safest way to run linters with automatic fixes: Pre-commit
saves and restores any changes you haven’t staged, and it rolls back the fixes if they conflict with
your changes.
8
Charlie Marsh: “The Ruff Formatter”, October 24, 2023.
9
Stephen C. Johnson, the author of Lint, also established this infamous naming convention by writing
Yacc (Yet Another Compiler-Compiler) in the early 1970s at Bell Labs.
0
“AST before and after formatting,” Black documentation. Last accessed: March 22, 2024.
1
You can inspect the AST of a source file with the standard ast module, using py -m ast
example.py .
Chapter 10. Using Types for Safety and
Inspection
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take advantage
of these technologies long before the official release of these titles.
This will be the 10th chapter of the final book. Please note that the GitHub
repo will be made active later on.
If you have comments about how we might improve the content and/or
examples in this book, or if you notice missing material within this chapter,
please reach out to the author at [email protected].
Statically typed languages, like C++, require you to declare the types of
variables upfront (unless the compiler is smart enough to infer them
automatically). In exchange, compilers ensure a variable only ever holds
compatible values. That eliminates entire classes of bugs. It also enables
optimizations: compilers know how much space the variable needs to store
its values.
Dynamically typed languages break with this paradigm: they let you assign
any value to any variable. Scripting languages like Javascript and Perl even
convert values implicitly—say, from strings to numbers. This radically
speeds up the process of writing code. It also gives you more leeway to
shoot yourself into the foot.
import math
In Python, a variable is just a name for a value. Variables don’t have types
—values do. The program associates the same name, number , first with a
value of type str , then with a value of type float . But unlike Perl and
similar languages, Python never converts the values behind your back, in
eager anticipation of your wishes:
>>> math.sqrt("1.21")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be real number, not str
Most Python functions don’t check the types of their arguments at all.
Instead, they simply invoke the operations they expect their arguments to
provide. Fundamentally, the type of a Python object doesn’t matter as long
as its behavior is correct. Taking inspiration from Vaucanson’s mechanical
duck from the times of Louis XV, this approach is known as duck typing:
“If it looks like a duck and quacks like a duck, then it must be a duck.”
As an example, consider the join operation in concurrent code. This
operation lets you wait until some background work completes, “joining”
the threads of control back together, as it were. Example 10-1 defines a
duck-typed function that invokes join on a number of tasks, waiting for
each in turn.
def join_all(joinables):
for task in joinables:
task.join()
You can use this function with Thread or Process from the standard
threading or multiprocessing modules—or with any other
object that has a join method with the correct signature. (You can’t use it
with strings because str.join takes an argument—an iterable of
strings.) Duck typing means that these classes don’t need a common base
class to benefit from reuse. All the types need is a join method with the
correct signature.
Duck typing is great because the function and its callers can evolve fairly
independently—a property known as loose coupling. Without duck typing,
a function argument has to implement an explicit interface that specifies its
behavior. Python gives you loose coupling for free: you can pass literally
anything, as long as it satisfies the expected behavior.
If you’ve ever had to read an entire codebase to grasp the purpose of a few
lines within it, you know what I mean: it can be impossible to understand a
Python function in isolation. Sometimes, the only way to decipher what’s
going on is to look at its callers, their callers, and so on (Example 10-2).
To some extent, type checkers can deduce the type of a function or variable
automatically, using a process called type inference. They become much
more powerful when you give programmers a way to specify types
explicitly in their code. By the middle of the last decade, and thanks in
1
particular to the foundational work of Jukka Lehtosalo and collaborators,
the Python language acquired a way to express the types of functions and
variables in source code, called type annotations (Example 10-3).
Example 10-3. A function with type annotations
On their own, type annotations mostly don’t affect the runtime behavior of
a program. The interpreter doesn’t check that assignments are compatible
with the annotated type; it merely stores the annotation inside the special
__annotations__ attribute of the containing module, class, or
function. While this incurs a small overhead at runtime, it means you can
inspect type annotations at runtime to do exciting stuff—say, construct your
domain objects from values transmitted on the wire, without any
boilerplate.
One of the most important uses of type annotations, though, doesn’t happen
at runtime: static type checkers, like mypy, use them to verify the
correctness of your code without running it.
Editors and IDEs leverage type annotations to give you a better coding
experience, with auto-completion, tooltips, and class browsers. You can also
inspect type annotations at runtime, unlocking powerful features such as
data validation and serialization.
If you use type annotations in your own code, you reap more benefits. First,
you’re also a user of your own functions, classes, and modules—so all the
benefits above apply, like auto-completion and type checking. Additionally,
you’ll find it easier to reason about your code, refactor it without
introducing subtle bugs, and build a clean software architecture. As a
library author, typing lets you specify an interface contract on which your
users can rely, while you’re free to evolve the implementation.
In this chapter, you’ll learn how to verify the type safety of your Python
programs using the static type checker mypy and the runtime type checker
Typeguard. You’ll also see how runtime inspection of type annotations can
greatly enhance the functionality of your programs. But first, let’s take a
look at the typing language that has evolved within Python over the past
decade.
Try out the small examples in this section on one of the type-checker playgrounds:
Variable Annotations
You can annotate a variable with the type of values that it may be assigned
during the course of the program. The syntax for such type annotations
consists of the variable name, a colon, and a type:
answer: int = 42
Besides the simple built-in types like bool , int , float , str , or
bytes , you can also use standard container types in type annotations,
such as list , tuple , set , or dict . For example, here’s how you
might initialize a variable used to store a list of lines read from a file:
lines: list[str] = []
While the previous example was somewhat redundant, this one provides
actual value: Without the type annotation, the type checker can’t deduce
that you want to store strings in the list.
The built-in containers are examples of generic types—types that take one
or more arguments. Here’s an example of a dictionary mapping strings to
integers. The two arguments of dict specify the key and value types,
respectively:
Tuples are a bit special, because they come in two flavors. Tuples can be a
combination of a fixed number of types, such as a pair of a string and int:
Any class you define in your own Python code is also a type:
class Parrot:
pass
class NorwegianBlue(Parrot):
pass
In general, the Python typing language requires that the type on the right-
hand side of a variable assignment be a subtype of the type on the left-hand
side. A prime example of the subtype relation is the relationship of a
subclass to its base class, like NorwegianBlue and Parrot .
TIP
Typing rules also permit assignments if the type on the right is consistent with that on the left. This
lets you assign an int to a float , even though int isn’t derived from float . The Any
type is consistent with any other type (see “Gradual Typing”).
Union Types
You can combine two types using the pipe operator ( | ) to construct a
union type, which is a type whose values range over all the values of its
constituent types. For example, you can use it for a user ID that’s either
numeric or a string:
Arguably the most important use of the union type is for “optional” values,
where the missing value is encoded by None . Here’s an example where a
description is read from a README, provided that the file exists:
description: str | None = None
if readme.exists():
description = readme.read_text()
Union types are another example for the subtype relation: Each type
involved in the union is a subtype of the union. For example, str and
None are each subtypes of the union type str | None .
I skipped over None above when discussing the built-in types. Strictly
speaking, None is a value, not a type. The type of None is called
NoneType , and it’s available from the standard types module. For
convenience, Python lets you write None in annotations to refer to the
type, as well.
How do you tell the type checker that your use of description is fine?
Generally, you should just check that the variable isn’t None . The type
checker will pick up on this and allow you to use the variable:
There are several methods for type narrowing, as this technique is known. I
won’t discuss them all in detail here. As a rule of thumb, the control flow
must only reach the line in question when the value has the right type—and
type checkers must be able to infer this fact from the source code. For
example, you could also use the assert keyword with a built-in function
like isinstance :
If you already know that the value has the right type, you can help out the
type checker using the cast function from the typing module:
description_str = cast(str, description)
for line in description_str.splitlines():
...
At runtime, the cast function just returns its second argument. Unlike
isinstance , it works with arbitrary type annotations.
Gradual Typing
In Python, every type ultimately derives from object . This is true for
user-defined classes and primitive types alike, even for types like int or
None . In other words, object is a universal supertype in Python—you
can assign literally anything to a variable of this type.
This may sound kind of powerful, but it really isn’t. In terms of behavior,
object is the smallest common denominator of all Python values, so
there’s precious little you can do with it, as far as type checkers are
concerned:
number: object = 2
print(number + number) # error: Unsupported left
There’s another type in Python that, like object , can hold any value. It’s
called Any (for obvious reasons) and it’s available from the standard
typing module. When it comes to behavior, Any is object ’s polar
opposite. You can invoke any operation on a value of type Any —
conceptually, it behaves like the intersection of all possible types. Any
serves as an escape hatch that lets you opt out of type checking for a piece
of code:
In the first example, the object type results in a false positive: the code
works at runtime, but type checkers will reject it. In the second example,
the Any type results in a false negative: the code crashes at runtime, but
type checkers won’t flag it.
WARNING
When you’re working in typed Python code, watch out for Any . It can disable type-checking to a
surprising degree. For example, if you access attributes or invoke operations on Any values, you’ll
end up with more Any values.
The Any type is Python’s hat trick that lets you restrict type checking to
portions of a codebase—formally known as gradual typing. In variable
assignments and function calls, Any is consistent with every other type,
and every type is consistent with it.
There are at least a couple of reasons why gradual typing is valuable. First,
Python existed without type annotations for two decades, and Python’s
governing body has no intentions to make type annotations obligatory.
Therefore, typed and untyped Python will coexist for the foreseeable future.
Second, Python’s strength comes in part from its ability to be highly
dynamic where needed—for example, Python makes it easy to assemble or
even modify classes on the fly. In some cases, it’s hard (or outright
3
impossible) to apply strict types to such highly dynamic code.
Function Annotations
As you may recall from Example 10-3, type annotations for function
arguments look quite similar to those for variables. Return types, on the
other hand, are introduced with a right arrow instead of a colon—after all,
the colon already introduces the function body in Python. For example,
here’s a type-annotated function for adding two numbers:
Type checkers assume that a function without a return type returns Any .
Likewise, function parameters without annotations default to Any .
Effectively, this disables type checking for the function—exactly the
behavior you’d want in a world with large bodies of untyped Python code.
import subprocess
from typing import Any
Annotating Classes
The rules for variable and function annotations also apply in the context of
class definitions, where they describe instance variables and methods. You
can omit the annotation for the self argument in a method. Type
checkers can infer instance variables from assignments in a __init__
method:
class Swallow:
def __init__(self, velocity: float) -> None:
self.velocity = velocity
@dataclass
class Swallow:
velocity: float
The dataclass-style definition isn’t only more concise than the handwritten
one, it also confers the class additional runtime behavior—such as the
ability to compare instances for equality based on their attributes, or to
order them.
When you’re annotating classes, the problem of forward references often
appears. Consider a two-dimensional point, with a method to compute its
Euclidean distance from another point:
import math
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
While type checkers are happy with this definition, the code raises an
4
exception when you run it with the Python interpreter:
Python doesn’t let you use Point in the method definition, because
you’re not done defining the class—the name doesn’t exist yet. There are
several ways to resolve this situation. First, you can write the forward
reference as a string to avoid the NameError , a technique known as
stringized annotations.
@dataclass
class Point:
def distance(self, other: "Point") -> float:
...
Second, you can implicitly stringize all annotations in the current module
using the annotations future import:
@dataclass
class Point:
def distance(self, other: Point) -> float:
...
The third method does not help with all forward references, but it does here.
You can use the special Self type to refer to the current class:
@dataclass
class Point:
def distance(self, other: Self) -> float:
...
Type Aliases
5
You can use the type keyword to introduce an alias for a type:
Generics
There’s no reason to restrict the element type to a string: the logic doesn’t
depend on it. Let’s make the function generic for all types. First, replace
str with the placeholder T . Second, mark the placeholder as a type
variable by declaring it in square brackets after the function name. (The
name T is just a convention, you could name it anything.) Additionally,
there’s no reason to restrict the function to lists, because it works with any
type over which you can iterate in a for loop—in other words, any
iterable.
from collections.abc import Iterable
Here’s how you might use the generic function in your code:
You can omit the variable annotations for fruit and number , by the
way—type checkers infer them from the annotation of your generic
function.
NOTE
Generics with the [T] syntax are supported in Python 3.12+ and the Pyright type checker. If you
get an error, omit the [T] suffix from first and use TypeVar from the typing module:
T = TypeVar("T")
Protocols
The join_all function from Example 10-1 works with threads,
processes, or any other objects you can join. Duck typing makes your
functions simple and reusable. But how can you verify the implicit contract
between the functions and their callers?
Protocols bridge the gap between duck typing and type annotations. A
protocol describes the behavior of an object, without requiring the object to
inherit from it. It looks somewhat like an abstract base class—a base class
that doesn’t implement any methods:
class Joinable(Protocol):
def join(self) -> None:
...
The Joinable protocol requires the object to have a join method that
takes no arguments and returns None . The join_all function can use
the protocol to specify which objects it supports:
[project.optional-dependencies]
typing = ["mypy>=1.9.0"]
You can now install mypy in the project environment:
If you use Poetry, add mypy to your project using poetry add :
$ py -m mypy src
Success: no issues found in 2 source files
Let’s type-check some code with a type-related bug. Consider the following
program, which passes None to a function that expects a string:
import textwrap
summary = data.get("extract")
summary = textwrap.fill(summary)
If you run mypy on this code, it dutifully reports that the argument in the
call to textwrap.fill isn’t guaranteed to be a string:
$ py -m mypy example.py
example.py:5: error: Argument 1 to "fill" has inc
expected "str" [arg-type]
Found 1 error in 1 file (checked 1 source file)
Let’s revisit the Wikipedia API client from Example 6-3. In a fictional
scenario, sweeping censorship laws have been passed. Depending on the
country you’re connecting from, the Wikipedia API omits the article
summary.
You could store an empty string when this happens. But let’s be principled:
An empty summary isn’t the same as no summary at all. Let’s store None
when the response omits the field.
@dataclass
class Article:
title: str = ""
summary: str | None = None
A few lines below, the show function reformats the summary to ensure a
line length of 72 characters or fewer:
Presumably, mypy will balk at this error, just like it did above. Yet, when
you run it on the file, it’s all sunshine. Can you guess why?
$ py -m mypy src
Success: no issues found in 2 source files
Mypy doesn’t complain about the call because the article parameter
doesn’t have a type annotation. It considers article to be Any , so the
expression article.summary also becomes Any . ( Any is
infectious.) As far as mypy is concerned, that expression can be str ,
None , and a pink elephant, all at the same time. This is gradual typing in
action, and it’s also why you should be wary of Any types and missing
annotations in your code.
You can help mypy detect the error by annotating the parameter as
article: Article . Try actually fixing the bug, as well—think about
how you would handle the case of summaries being None in a real
program. Here’s one way to solve this:
Strict Mode
[tool.mypy]
strict = true
$ py -m mypy src
$ py m mypy src
__init__.py:16: error: Function is missing a type
__init__.py:22: error: Function is missing a type
__init__.py:30: error: Function is missing a retu
__init__.py:30: note: Use "-> None" if function d
__init__.py:31: error: Call to untyped function "
__init__.py:32: error: Call to untyped function "
__main__.py:3: error: Call to untyped function "m
Found 6 errors in 2 files (checked 2 source files
Example 10-4 shows the module with type annotations and introduces two
concepts you haven’t seen yet. First, the Final annotation marks
API_URL as a constant—a variable to which you can’t assign another
value. Second, the TextIO type is a file-like object for reading and
writing strings ( str ), such as the standard output stream. Otherwise, the
type annotations should look fairly familiar.
import json
import sys
import textwrap
import urllib.request
from dataclasses import dataclass
from typing import Final, TextIO
@dataclass
class Article:
title: str = ""
summary: str = ""
I recommend strict mode for any new Python project, because it’s much
easier to annotate your code as you write it. Strict checks give you more
confidence in the correctness of your program, because type errors are less
likely to be masked by Any .
TIP
My other favorite mypy setting in pyproject.toml is the pretty flag. It displays source snippets
and indicates where the error occurred:
[tool.mypy]
pretty = true
Let mypy’s strict mode be your North Star when adding types to an existing
Python codebase. Mypy gives you an arsenal of finer and coarser-grained
ways to relax type checking when you’re not ready to fix a type error.
[tool.mypy."<module>"]
allow_untyped_calls = true
Replace <module> with the module that has the untyped calls.
Use double quotes if the module’s name contains any dots.
[tool.mypy]
allow_untyped_calls = true
You can even disable all type errors for a given module:
[tool.mypy."<module>"]
ignore_errors = true
All through this book, you’ve automated checks for your projects using
Nox. Nox sessions allow you and other contributors to run checks easily
and repeatedly during local development, the same way they’d run on a
continuous integration (CI) server.
Example 10-5 shows a Nox session for type-checking your project with
mypy:
import nox
Just like you run the test suite across all supported Python versions, you
should also type-check your project on every Python version. This practice
is fairly effective at ensuring that your project is compatible with those
versions, even when your test suite doesn’t exercise that one code path
where you forgot about backwards compatibility.
NOTE
You can also pass the target version using mypy’s --python-version option. However,
installing the project on each version ensures that mypy checks your project against the correct
dependencies. These may not be the same on all Python versions.
Inevitably, as you type-check on multiple versions, you’ll get into situations
where either the runtime code or the type annotations don’t work across all
versions. For example, Python 3.9 deprecated typing.Iterable in
favor of collections.abc.Iterable . Use conditional imports
based on the Python version, as shown below. Static type checkers
recognize Python version checks in your code, and they will base their type
checks on the code relevant for the current version.
import sys
Another sticking point: typing features not yet available at the low end of
your supported Python version range. Fortunately, these often come with
backports in a third-party library named typing-extensions . For
example, Python 3.11 added the useful Self annotation, which denotes
the currently enclosing class. If you support versions older than that, add
typing-extensions to your dependencies and import Self from
there:
import sys
if sys.version_info >= (3, 11):
from typing import Self
else:
from typing_extensions import Self
You may wonder why the Nox session in Example 10-5 installs the project
into mypy’s virtual environment. By nature, a static type checker operates
on source code; it doesn’t run your code. So why install anything but the
type checker itself?
To see why this matters, consider the version of the Wikipedia project in
Example 6-5 and Example 6-14, where you implemented the show and
fetch functions using Rich and httpx . How can a type checker
validate your use of a specific version of a third-party package?
Rich and httpx are, in fact, fully type annotated. They include an empty
marker file named py.typed next to their source files. When you install the
packages into a virtual environment, the marker file allows static type
checkers to locate their types.
Many Python packages distribute their types inline with py.typed markers.
However, other mechanisms for type distribution exist. Knowing them is
useful when mypy can’t import the types for a package.
For example, the factory-boy library doesn’t yet ship with types—
instead, you need to install a stubs package named types-factory-
7
boy from PyPI. A stubs package is a Python package containing typing
stubs, a special kind of Python source file with a .pyi suffix that has only
type annotations and no executable code.
If you’re entirely out of luck and types for your dependency simply don’t
exist, disable the mypy error in pyproject.toml, like this:
[tool.mypy.<package>]
ignore_missing_imports = true
NOTE
Python’s standard library doesn’t include type annotations. Type checkers vendor the third-party
package typeshed for standard library types, so you don’t have to worry about supplying them.
Treat your tests like you would treat any other code. Type-checking your
tests helps you detect when they use your project, pytest, or testing libraries
incorrectly.
TIP
Running mypy on your test suite also type-checks the public API of your project. This can be a good
fallback when you’re unable to fully type your implementation code for every supported Python
version.
Example 10-6 extends the Nox session to type-check your test suite. Install
your test dependencies, so mypy has access to type information for pytest
and friends.
The test suite imports your package from the environment. The type
checker therefore expects your package to distribute type information. Add
an empty py.test marker file to your import package, next to the
__init__ and __main__ modules (see “Distributing Types with
Python Packages”).
There isn’t anything inherently special about typing a test suite. Recent
versions of pytest come with high-quality type annotations. These help
when your tests use one of pytest’s built-in fixtures. Many test functions
don’t have arguments and return None . Here’s a slightly more involved
example using a fixture and test from Chapter 6:
import io
import pytest
from random_wikipedia_article import Article, sho
@pytest.fixture
def file() -> io.StringIO:
return io.StringIO()
import sys
Recall that the fetch function instantiates the class like this:
8
The Zen of Python says, “Special cases aren’t special enough to break the
rules.” Dataclasses make no exception to this principle: they’re plain
Python classes without any secret sauce. Given that the class doesn’t define
the method itself, there’s only one possible origin for it: the @dataclass
class decorator. In fact, the decorator synthesizes the __init__ method
on the fly, along with several other methods, using your type annotations!
Don’t take my word for it, though. In this section, you’re going to write
your own miniature @dataclass decorator.
WARNING
Don’t use this in production! Use the standard dataclasses module, or better: the attrs
library. Attrs is an actively maintained, industry-strength implementation with better performance, a
clean API, and additional features, and it directly inspired dataclasses .
First of all, be a good typing citizen and think about the signature of the
@dataclass decorator. A class decorator accepts a class and returns it,
usually after transforming it in some way, such as by adding a method. In
Python, classes are objects you can pass around and manipulate to your
liking.
The typing language allows you to refer to, say, the str class by writing
type[str] . You can read this aloud as “the type of a string”. (You can’t
use str on its own here. In a type annotation, str just refers to an
individual string.) A class decorator should work for any class object,
though—it should be generic. Therefore, you’ll use a type variable instead
9
of an actual class like str :
@dataclass_transform()
def dataclass[T](cls: type[T]) -> type[T]:
...
With the function signature out of the way, let’s think about how to
implement the decorator. You can break this down into two steps. First,
you’ll need to assemble a string with the source code of the __init__
method, using the type annotations on the dataclass. Second, you can use
Python’s built-in exec function to evaluate that source code in the
running program.
Example 10-8.
Use a type variable T in the signature to make this generic for any
class.
Retrieve the annotations of the class as a dictionary of names and
types.
You can now pass the source code to the exec built-in. Apart from the
source code, this function accepts dictionaries for the global and local
variables.
globals = sys.modules[cls.__module__].__dict__
For the local variables, you can pass an empty dictionary—this is where
exec will place the method definition. All that’s left is to copy the
method from the locals dictionary into the class object and return the class.
Without further ado, Example 10-9 shows the entire decorator.
Example 10-9. Your own @dataclass decorator
@dataclass_transform()
def dataclass[T](cls: type[T]) -> type[T]:
sourcecode = build_dataclass_init(cls)
globals = sys.modules[cls.__module__].__dict_
locals = {}
exec(sourcecode, globals, locals)
cls.__init__ = locals["__init__"]
return cls
Retrieve the global variables from the module that defines the class.
This is where the magic happens: let the interpreter compile the
generated code on the fly.
There’s more you can do with types at runtime besides generating class
boilerplate. One important example is runtime type checking. To see how
useful this technique is, let’s take another look at the fetch function:
def fetch(url: str) -> Article:
with urllib.request.urlopen(url) as response
data = json.load(response)
return Article(data["title"], data["extract"]
If you’ve paid close attention, you may have noticed that fetch is not
type-safe. Nothing guarantees that the Wikipedia API will return a JSON
payload of the expected shape. You might object that Wikipedia’s OpenAPI
specification tells us exactly which data shape to expect from the endpoint.
But don’t base your static types on assumptions about external systems—
unless you’re happy with your program crashing when a bug or API change
breaks those assumptions.
As you may have guessed, mypy silently passes over this issue, because
json.load returns Any . How can we make the function type-safe? As
a first step, let’s replace Any with the JSON type you defined in “Type
Aliases”:
We haven’t fixed the bug, but at least mypy gives us diagnostics now
(edited for brevity):
$ py -m mypy src
error: Value of type "..." is not indexable
error: No overload variant of "__getitem__" match
error: Argument 1 to "Article" has incompatible t
error: Invalid index type "str" for "JSON"; expec
error: Argument 2 to "Article" has incompatible t
Found 5 errors in 1 file (checked 1 source file)
Mypy’s diagnostics boil down to two separate issues in the function. First,
the code indexes data without verifying that it’s a dictionary. Second, it
passes the results to Article without making sure they’re strings.
Let’s check the type of data then—it has to be a dictionary with strings
under the title and extract keys. You can express this concisely
using structural pattern matching:
match data:
case {"title": str(title), "extract": str
return Article(title, extract)
The function is type-safe now, but can we do better than this? The
validation code duplicates the structure of the Article class—you
shouldn’t need to spell out the types of its fields again. If your application
must validate more than one input, the boilerplate can hurt readability and
maintainability. It should be possible to assemble articles from JSON
objects using only the original type annotations—and it is.
For this last iteration on the Wikipedia example, add cattrs to your
dependencies:
[project]
dependencies = ["cattrs>=23.2.3"]
Replace the fetch function with the three-liner below (don’t run this yet,
we’ll get to the final version in a second):
import cattrs
However, you still need to take care of one complication. The summary
attribute doesn’t match the name of its corresponding JSON field,
extract . Fortunately, cattrs is flexible enough to let you create a
custom converter that renames the field on the fly:
import cattrs.gen
converter = cattrs.Converter()
converter.register_structure_hook(
cattrs.gen.make_dict_structure_fn(
Article,
converter,
summary=override(rename="extract"),
)
)
There are also practical advantages to the cattrs approach. You can
serialize the same object in different ways if you need to. It’s not intrusive
—it doesn’t add methods to your objects. And it works with all kinds of
types: dataclasses, attrs-classes, named tuples, typed dicts, and even plain
type annotations like tuple[str, int] .
Static type checkers won’t catch every type-related error. In this case,
gradual typing obscured the issue—specifically, json.load returning
Any . Real-world code has plenty of situations like this. A library outside
of your control might have overly permissive type annotations—or none at
all. A bug in your persistence layer might load corrupted objects from disk.
Maybe mypy would have caught the issue, but you silenced type errors for
the module in question.
Dynamic code
Python code can be highly dynamic, forcing type annotations to be
permissive. Your assumptions about the code may be at odds with the
concrete types you end up with at runtime.
External systems
Third-party libraries
[project]
dependencies = ["typeguard>=4.1.5"]
The checks can also be more elaborate. For example, you can use the
TypedDict construct to specify the precise shape of a JSON object
you’ve fetched from some external service, such as the keys you expect to
12
find and which types their associated values should have:
class Person(TypedDict):
name: str
age: int
@classmethod
def check(cls, data: Any) -> Person:
return check_type(data, Person)
@typechecked
def load_people(path: Path) -> list[Person]:
with path.open() as io:
return json.load(io)
import typeguard
typeguard.config.collection_check_strategy = Coll
package = "random_wikipedia_article"
@nox.session
def typeguard(session: nox.Session) -> None:
session.install(".[tests]", "typeguard")
session.run("pytest", f"--typeguard-packages=
Running Typeguard as a pytest plugin lets you track down type-safety bugs
in a large codebase—provided it has good test coverage. If it doesn’t,
consider enabling runtime type checking for individual functions or
modules in production. Be careful here: Look for false positives from the
type checks, and measure their runtime overhead.
Summary
Type annotations let you specify the types of variables and functions in your
source code. You can use built-in types and user-defined classes, as well as
many higher-level constructs, such as union types, Any for gradual typing,
generics, and protocols. Stringized annotations and Self are useful for
handling forward references. The type keyword lets you introduce type
aliases.
Static type checkers like mypy leverage type annotations and type inference
to verify the type safety of your program without running it. Mypy
facilitates gradual typing by defaulting to Any for unannotated code. You
can and should enable strict mode where possible to allow for more
thorough checks. Run mypy as part of your mandatory checks, using a Nox
session for automation.
There’s a widespread sentiment that type annotations are for the sprawling
codebases found at giant tech corporations—and not worth the trouble for
reasonably sized projects, let alone the quick script you hacked together
yesterday afternoon. I disagree. Type annotations make your programs
easier to understand, debug, and maintain, no matter how large they are or
how many people work on them.
Try using types for any Python code you write. Ideally, configure your
editor to run a type checker in the background, if it doesn’t already come
with typing support out-of-the-box. If you feel that types get in your way,
consider using gradual typing—but also consider whether there might be a
simpler way to write your code that gives you type safety for free. If your
project has any mandatory checks, type checking should be a part of them.
Throughout the book, you’ve automated checks and tasks for your project
using Nox. Nox sessions allow you and other contributors to run checks
early and repeatedly during local development, in the same way they’d run
on a CI server. For reference, here’s a listing of the Nox sessions you’ve
defined:
The earlier you identify a software defect, the smaller the cost of fixing it.
In the best case, you discover issues while they’re still in your editor—their
cost is near zero. In the worst case, you ship the bug to production. Before
even starting to track down the issue in the code, you may have to roll back
the bad deployment and contain its impact. For this reason, shift all your
checks as far to the left on that imaginary timeline as possible.
(Run checks towards the right of the timeline, as well. End-to-end tests
against your production environments are invaluable for increasing
confidence that your systems are operating as expected.)
Mandatory checks in CI are the main gatekeeper: they decide which code
changes make it into the main branch and ship to production. But don’t wait
for CI. Run checks locally, as early as possible. Automating checks with
Nox and pre-commit helps achieve this goal.
Integrate linters and type checkers with your editor, as well! Alas, people
haven’t yet agreed on a single editor that everybody should use. Tools like
Nox give you a common baseline for local development in your teams.
Automation also greatly reduces the cost of project maintenance.
Contributors run a single command, such as nox , as an entrypoint to the
mandatory checks. Other chores, like refreshing lock files or generating
documentation, likewise only require simple commands. By encoding each
process, you eliminate human error and create a basis for constant
improvement.
Thank you for reading this book! While the book ends here, your journey
through the ever-shifting landscape of modern Python developer tooling
continues. Hopefully, the lessons from this book will remain valid and
helpful, as Python continues to reinvent itself.
1
Jukka Lehtosalo, “Our journey to type checking 4 million lines of Python,” September 5, 2019.
2
“Specification for the Python type system.” Last accessed: January 22, 2024.
3
Tin Tvrtković: “Python is two languages now, and that’s actually great,” February 27, 2023.
4
In a future Python version, this will work out of the box. See Larry Hastings: “PEP 649 – Deferred
Evaluation Of Annotations Using Descriptors”, January 11, 2021.
5
If you see an error message like “PEP 695 type aliases are not yet supported,” just omit the type
keyword for now. The type checker still interprets the assignment as a type alias. If you want to be
more explicit, you can use the typing.TypeAlias annotation from Python 3.10 upwards.
6
For brevity, I’ve removed error codes and leading directories from mypy’s output.
7
As of this writing, the upcoming release of factory-boy is expected to distribute types inline.
8
Tim Peters: “PEP 20 – The Zen of Python,” August 19, 2004.
9
As of this writing, mypy hasn’t yet added support for PEP 695 type variables. If you get a mypy
error, type-check the code in the Pyright playground instead or use the older TypeVar syntax.
0
In fact, the cattrs library is format-agnostic, so it doesn’t matter if you read the raw object from
JSON, YAML, TOML, or another data format.
1
If you’re interested in this topic, you should absolutely read Architecture Patterns in Python, by
Harry Percival and Bob Gregory (Sebastopol: O’Reilly, 2020).
2
This is less useful than it may seem. TypedDict classes must list every field even if you only use
a subset.
3
If you call check_type directly, you’ll need to pass the collection_check_strategy
argument explicitly.
About the Author
Claudio Jolowicz is a senior software engineer at Cloudflare with nearly
two decades of industry experience in Python and C++ and an open source
maintainer active in the Python community. He is the author of the
Hypermodern Python blog and project template, and co-maintainer of Nox,
a Python tool for test automation. In former lives, Claudio has worked as a
legal scholar and as a musician touring from Scandinavia to West Africa.
Get in touch with him on Mastodon: @[email protected]
Colophon
The animal on the cover of Hypermodern Python Tooling is the Peruvian
sheartail (Thaumastura cora), a member of the Mellisugini tribe of bee
hummingbirds.
The males of most species in this tribe have specialized tail feathers, often
used to produce sounds during courtship display. As shown on the cover of
this book, male Peruvian sheartails indeed sport very long, black and white
forked tails. The upperparts of both sexes are a luminous green, while
males’ throat feathers are a lustrous purple to magenta.
Due to its stable population, the Peruvian sheartail has been classified by
the IUCN as being of least concern from a conservation standpoint. Many
of the animals on O’Reilly covers are endangered; all of them are important
to the world.
The cover illustration is by Karen Montgomery, based on an antique line
engraving from Wood’s Natural History. The series design is by Edie
Freedman, Ellie Volckhausen, and Karen Montgomery. The cover fonts are
Gilroy Semibold and Guardian Sans. The text font is Adobe Minion Pro;
the heading font is Adobe Myriad Condensed; and the code font is Dalton
Maag’s Ubuntu Mono.