Dokumen - Pub - Hypermodern Python Tooling Fourth Release 9781098139582
Dokumen - Pub - Hypermodern Python Tooling Fourth Release 9781098139582
With Early Release ebooks, you get books in their earliest form—the
author’s raw and unedited content as they write—so you can take
advantage of these technologies long before the official release of these
titles.
Claudio Jolowicz
Hypermodern Python Tooling
by Claudio Jolowicz
Copyright © 2024 Claudio Jolowicz. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles
(https://oreilly.com). For more information, contact our
corporate/institutional sales department: 800-998-9938 or
[email protected].
If you’ve picked up this book, you likely have Python installed on your
machine already. Most common operating systems ship with a python or
python3 command. This can be the interpreter used by the system itself or a
shim that installs Python for you when you invoke it for the first time.
Why dedicate an entire chapter to the topic if it’s so easy to get Python onto
a new machine? The answer is that installing Python for long-term
development can be a complex matter, and there are several reasons for this:
In this first chapter, I’ll show you how to install multiple Python versions
on some of the major operating systems in a sustainable way, and how to
keep your little snake farm in good shape.
TIP
If you only develop for a single platform, feel free to skip ahead to your preferred
platform. I’d encourage you to learn about working with Python on other operating
systems though. It’s fun—and familiarity with other platforms enables you to provide a
better experience to the contributors and users of your software.
NOTE
On Windows, PATH-based interpreter discovery is far less relevant because Python
installations can be located via the Windows Registry (see “The Python Launcher for
Windows”).
export PATH="/usr/local/opt/python/bin:$PATH"
Note that you’re adding the bin subdirectory instead of the installation root,
because that’s where the interpreter is normally located on these systems.
We’ll take a closer look at the layout of Python installations in Chapter 2.
The line above also works with the Zsh shell, which is the default on
macOS. That said, there’s a more idiomatic way to manipulate the search
path on Zsh:
typeset -U path
path=(/usr/local/opt/python/bin $path)
This instructs the shell to remove duplicate entries from the search path.
The shell keeps the path array synchronized with the PATH variable.
The Fish shell offers a function to uniquely and persistently prepend an
entry to the search path:
fish_add_path /usr/local/opt/python/bin
It would be tedious to set up the search path manually at the start of every
shell session. Instead, you can place the commands above in your shell
profile—this is a file in your home directory that is read by the shell on
startup. Table 1-1 shows the most common ones:
T
a
b
le
1
-
1
.
T
h
e
st
a
rt
u
p
fi
le
s
o
f
s
o
m
e
c
o
m
m
o
n
s
h
el
ls
Zsh .zshrc
Fish .config/fish/fish.config
The order of directories on the search path matters because earlier entries
take precedence over, or “shadow”, later ones. You’ll often add Python
versions against a backdrop of existing installations, such as the interpreter
used by the operating system, and you want the shell to choose your
installations over those present elsewhere.
TIP
Unless your system already comes with a well-curated and up-to-date selection of
interpreters, prepend Python installations to the PATH environment variable, with the
latest stable version at the very front.
NOTE
Depending on your domain and target environment, you may prefer to use the Windows
Subsystem for Linux (WSL) for Python development. In this case, please refer to the
section “Installing Python on Linux” instead.
> py
Python 3.10.5 (tags/v3.10.6:f377153, Jun 6 2022, 16:14:13) [...] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
By default, the Python Launcher selects the most recent version of Python
installed on the system. It’s worth noting that this may not be the same as
the most recently installed version on the system. This is good—you don’t
want your default Python to change when you install a bugfix release for an
older version.
If you want to launch a specific version of the interpreter, you can pass the
feature version as a command-line option:
> py -3.9
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [...] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
> py -V
Python 3.10.5
> py -3.9 -V
Python 3.9.13
Using the same mechanism, you can run a script on a specific interpreter:
NOTE
You may have recognized the #! line as what’s known as a shebang on Unix-like
operating systems. On these systems, the program loader uses it to locate and launch the
interpreter for the script.
If the script is installed as a module, you can also pass its import name to
the -m interpreter option:
> py -m module
Pip installs packages into its own environment, so invoking it via the
Python Launcher lets you control where a package is installed. The explicit
form is almost always what you want, so you should prefer it over the
shorter pip install as a matter of routine.4
As you have seen, the Python Launcher defaults to the newest version on
the system. There is an exception to this rule if a virtual environment is
active. In this case, py defaults to the interpreter in the virtual environment.
(You can think of virtual environments as lightweight satellites of a full
Python installation; I’ll talk more about them in Chapter 2.) In the following
example session using Powershell, you create and activate a virtual
environment using an older interpreter version, then switch back to the
global environment:
> py -V
Python 3.10.5
> py -3.9 -m venv venv-3.9
> venv-3.9\Scripts\activate
(venv-3.9) > py -V
Python 3.9.13
(venv-3.9) > deactivate
> py -V
Python 3.10.5
WARNING
Do not pass the version to py when you have a virtual environment activated. This
would cause py to select the global Python installation, even if the version matches the
interpreter inside the active environment.
The Python Launcher defaults to the latest Python version on the system
even if that happens to be a prerelease. You can override this default
persistently by setting the PY_PYTHON and PY_PYTHON3 environment
variables to the current stable release:
Restart the console for the setting to take effect. Don’t forget to remove
these variables once you upgrade to the final release.
To conclude our short tour of the Python Launcher, use the command py --
list to enumerate the interpreters on your system:
> py --list
-V:3.11 Python 3.11 (64-bit)
-V:3.10 * Python 3.10 (64-bit)
-V:3.9 Python 3.9 (64-bit)
In this listing, the asterisk marks the default version of Python.
Homebrew Python
Homebrew is a third-party package manager for macOS and Linux. It
provides an overlay distribution, an open-source software collection that
you install on top of the existing operating system. Installing the package
manager is straightforward; refer to the official website for instructions.
Homebrew distributes packages for every maintained feature version of
Python. Use the brew command-line interface to manage them:
brew install [email protected]
Install a new Python version.
You may find that you already have some Python versions installed for
other Homebrew packages that depend on them. Nonetheless, it’s important
that you install every version explicitly. Automatically installed packages
may get deleted when you run brew autoremove to clean up resources.
Homebrew places a python3.x command for each version on your PATH,
as well as a python3 command for its main Python package—which may
be either the current or the previous stable release. You should override this
to ensure both python and python3 point to the latest version. First, query
the package manager for the installation root (which is platform-
dependent):
Next, prepend the bin and libexec/bin directories from this installation to
your PATH. Here’s an example that works on the Bash shell:
export PATH="/opt/homebrew/opt/[email protected]/bin:$PATH"
export PATH="/opt/homebrew/opt/[email protected]/libexec/bin:$PATH"
TIP
Personally, I recommend Homebrew for managing Python on macOS—it’s well-
integrated with the rest of the system and easy to keep up-to-date. Use the python.org
installers to test your code against prereleases, which are not available from Homebrew.
NOTE
After installing a Python version, run the Install Certificates command located in the
/Applications/Python 3.x/ folder. This command installs Mozilla’s curated collection of
root certificates.
When you install a bugfix release for a Python version that is already
present on the system, it will replace the existing installation. Uninstalling a
Python version is done by removing these two directories:
/Library/Frameworks/Python.framework/Versions/3.x/
/Applications/Python 3.x/
The system Python in a Linux distribution may be quite old, and not
every distribution includes alternate Python versions in their main
package repositories.
Linux distributions have mandatory rules about how applications and
libraries may be packaged. For example, Debian’s Python Policy
mandates that the standard ensurepip module must be shipped in a
separate package; as a result, you can’t create virtual environments on
a default Debian system (a situation commonly fixed by installing the
python3-full package.)
The main Python package in a Linux distribution serves as the
foundation for other packages that require a Python interpreter. These
packages may include critical parts of the system, such as Fedora’s
package manager dnf. Distributions therefore apply safeguards to
protect the integrity of the system; for example, they often limit your
ability to use pip outside of a virtual environment.
In the next sections, I’ll take a look at installing Python on two major Linux
distributions, Fedora and Ubuntu. Afterwards, I’ll cover some generic
installation methods that don’t use the official package manager. I’ll also
introduce you to the Python Launcher for Unix, a third-party package that
aims to bring the py utility to Linux, macOS, and similar systems.
Fedora Linux
Fedora is an open-source Linux distribution, sponsored primarily by Red
Hat, and the upstream source for Red Hat Enterprise Linux (RHEL). It aims
to stay close to upstream projects and uses a rapid release cycle to foster
innovation. Fedora is renowned for its excellent Python support, with Red
Hat employing several Python core developers.
Python comes pre-installed on Fedora, and you can install additional Python
versions using dnf, its package manager:
sudo dnf install python3.x
Install a new Python version.
Fedora has packages for all active feature versions and prereleases of
CPython, the reference implementation of Python, as well as packages with
alternative implementations like PyPy. A convenient shorthand to install all
of these at once is to install tox:
In case you’re wondering, tox is a test automation tool that makes it easy to
run a test suite against multiple versions of Python (see Chapter 6); its
Fedora package pulls in most available interpreters as recommended
dependencies.
Ubuntu Linux
Ubuntu is a popular Linux distribution based on Debian and funded by
Canonical Ltd. Ubuntu only ships a single version of Python in its main
repositories; other versions of Python, including prereleases, are provided
by a Personal Package Archive (PPA). A PPA is a community-maintained
software repository on Launchpad, the software collaboration platform run
by Canonical.
Your first step on an Ubuntu system should be to add the deadsnakes PPA:
You can now install Python versions using the apt package manager:
sudo apt install python3.x-full
Install a new Python version.
TIP
Always remember to include the -full suffix when installing Python on Debian and
Ubuntu. The python3.x-full packages pull in the entire standard library and up-to-
date root certificates. In particular, they ensure you’re able to create virtual
environments.
$ py -V
3.10.6
$ py -3.9 -V
3.9.13
$ py --list
3.10 │ /usr/local/opt/[email protected]/bin/python3.10
3.9 │ /usr/local/opt/[email protected]/bin/python3.9
3.8 │ /usr/local/opt/[email protected]/bin/python3.8
3.7 │ /usr/local/opt/[email protected]/bin/python3.7
The Python Launcher for Unix discovers interpreters by scanning the PATH
environment variable for pythonX.Y commands; in other words, invoking
py -X.Y is equivalent to running pythonX.Y. The main benefit of py is to
provide a cross-platform way to launch Python, with a well-defined default
when no version is specified: the newest interpreter on the system or the
interpreter in the active virtual environment.
The Python Launcher for Unix also defaults to the interpreter in a virtual
environment if the environment is named .venv and located in the current
directory or one of its parents. Unlike with the Windows Launcher, you
don’t need to activate the environment for this to work. For example, here’s
a quick way to get an interactive session with the rich console library
installed:
$ py -m venv .venv
$ py -m pip install rich
$ py
>>> from rich import print
>>> print("[u]Hey, universe![/]")
Hey, universe!
Entry points are a more sustainable way to create scripts that does not rely
on handcrafting shebang lines. We’ll cover them in Chapter 3.
NOTE
In this section, we’ll use Pyenv as a build tool. If you’re interested in using Pyenv as a
version manager, please refer to the official documentation for additional setup steps.
The best way to install Pyenv on macOS and Linux is using Homebrew:
One great benefit of installing Pyenv from Homebrew is that you’ll also get
the build dependencies of Python. If you use a different installation method,
check the Pyenv wiki for platform-specific instructions on how to set up
your build environment.
Display the available Python versions using the following command:
As you can see, the list is quite impressive: Not only does it cover all active
feature versions of Python, it also includes prereleases, unreleased
development versions, almost every point release published over the past
two decades, and a wealth of alternative implementations, such as
GraalPython, IronPython, Jython, MicroPython, PyPy, and Stackless
Python.
You can build and install any of these versions by passing them to pyenv
install:
When using Pyenv as a mere build tool, as we’re doing here, you need to
add each installation to PATH manually. You can find its location using the
command pyenv prefix 3.x.y and append /bin to that. Here’s an
example for the Bash shell:
export PATH="$HOME/.pyenv/versions/3.x.y/bin:$PATH"
Conda requires shell integration to update the search path and shell prompt
when you activate or deactivate an environment. If you’ve installed
Miniforge from Homebrew, update your shell profile using the conda init
command with the name of your shell. For example:
Before you can use this Python installation, you need to activate the
environment:
This command will run in the active Conda environment. What’s great
about Conda is that it won’t upgrade Python to a release that’s not yet
supported by the Python libraries in the environment.
When you’re done working in the environment, deactivate it like this:
$ conda deactivate
Summary
In this chapter, you’ve learned how to install Python on Windows, macOS,
and Linux. The flow chart in Figure 1-3 should provide some guidance on
selecting the installation method that works best for you. You’ve also
learned how to use the Python Launcher to select interpreters installed on
your system. While the Python Launcher helps remove ambiguity when
selecting interpreters, you should still audit your search path to ensure you
have well-defined python and python3 commands.
Figure 1-3. Choosing an installation method for Python
1 While CPython is the reference implementation of Python, there are quite a few more to
choose from, from ports to other platforms (WebAssembly, Java, .NET, MicroPython) to
performance-oriented forks and reimplementations such as PyPy, Pyjion, Pyston, and Cinder.
2 Let’s take an example: At this time of writing, the long-term support (LTS) release of Debian
Linux ships patched versions of Python 2.7.13 and 3.5.3—both released half a decade ago. (To
be clear, this is an observation about real-world production environments, not about Debian.
Debian’s “testing” distribution, which is widely used for development, comes with a current
version of Python.)
3 Building Windows installers from source is beyond the scope of this book, but you can find a
good step-by-step guide on Stack Overflow.
4 As a side benefit, py -m pip install --upgrade pip is the only way to upgrade pip
without an Access denied error. Windows refuses to replace an executable while it’s still
running.
5 The UNIX command-line tools option places symbolic links in the /usr/local/bin directory,
which can conflict with Homebrew packages and other versions from python.org.
6 For historical reasons, framework builds use a different path for the per-user site directory,
the location where packages are installed if you invoke pip outside of a virtual environment
and without administrative privileges. This different installation layout can prevent you from
importing a previously installed package.
Chapter 2. Python
Environments
Let’s take a quick inventory—feel free to follow along on your own system:
The Python Interpreter
The executable that runs Python programs is named python.exe on
Windows and located at the root of the installation.1 On Linux and
macOS, the interpreter is named python3.x and stored in the bin
directory with a python3 symbolic link.
Python modules
Modules are containers of Python objects that you load via the import
statement. They are organized under Lib (Windows) or lib/python3.x
(Linux and macOS). While modules from the standard library are
distributed with Python, site packages are modules you install from the
Python Package Index (PyPI) or another source.
Entry-point scripts
These are executable files in Scripts (Windows) or bin (Linux and
macOS). They launch Python applications by importing and invoking
their entry-point function.
Shared libraries
Shared libraries contain native code compiled from low-level languages
like C. Their filenames end in .dll or .pyd on Windows, .dylib on
macOS, and .so on Linux. Some have a special entry point that lets you
import them as modules from Python—they’re known as extension
modules. Extension modules, in turn, may use other shared libraries
from the environment or the system.2
Headers
Python installations contain headers for the Python/C API, an
application programming interface for writing extension modules or
embedding Python as a component in a larger application. They are
located under Include (Windows) or include/python3.x (Linux and
macOS).
Static data
Python environments can also contain static data in various locations.
This includes configuration files, documentation, and any resource files
shipped with third-party packages.
The next sections take a closer look at the core parts of a Python
environment: the interpreter, modules, and scripts.
NOTE
By default, Python installations also include Tcl/Tk, a toolkit for creating graphical user
interfaces (GUIs) written in Tcl. The standard tkinter module allows you to use this
toolkit from Python.
The Interpreter
The Python interpreter ties the environment to three things:
a specific version of the Python language
a specific implementation of Python
a specific build of the interpreter
sys.implementation.name
The implementation of Python, such as "cpython" or "pypy"
sys.implementation.version
The version of the implementation, same as sys.version_info for
CPython
sys.executable
The location of the Python interpreter
sys.prefix
The location of the Python environment
sys.base_prefix
The location of the full Python installation, same as sys.prefix
outside of a virtual environment
sys.path
The list of directories searched when importing Python modules
Python Modules
Modules come in various forms and shapes. If you’ve worked with Python,
you’ve likely used most of them already. Let’s go over the different kinds:
Simple modules
In the simplest case, a module is a single file containing Python source
code. The statement import string executes the code in string.py and
binds the result to the name string in the local scope.
Packages
Directories with __init__.py files are known as packages—they allow
you to organize modules in a hierarchy. The statement import
email.message loads the message module from the email package.
Namespace packages
Directories with modules but no __init__.py are known as namespace
packages. You use them to organize modules in a common namespace
such as a company name (say acme.unicycle and acme.rocketsled).
Unlike with regular packages, you can distribute each module in a
namespace package separately.
Extension modules
Binary extensions are dynamic libraries with Python bindings; an
example is the standard math module. People write them for
performance reasons or to make existing C libraries available as Python
modules. Their names end in .pyd on Windows, .dylib on macOS, and
.so on Linux.
Built-in modules
Some modules from the standard library, such as the sys and builtins
modules, are compiled into the interpreter. The variable
sys.builtin_module_names lists all of them.
Frozen modules
Some modules from the standard library are written in Python but have
their bytecode embedded in the interpreter. Originally, only core parts of
importlib got this treatment. Recent versions of Python freeze every
module that’s imported during interpreter startup, such as os and io.
NOTE
The term package carries some ambiguity in the Python world. It refers both to modules
and to the artifacts used for distributing modules, also known as distributions. Unless
stated otherwise, this book uses package as a synonym of distribution.
distributions = importlib.metadata.distributions()
for distribution in sorted(distributions, key=lambda d: d.name):
print(f"{distribution.name:30} {distribution.version}")
Entry-point Scripts
Package installers like pip can generate entry-point scripts for third-party
packages they install. Packages only need to designate the function that the
script should invoke. This is a handy method to provide an executable for a
Python application.
Platforms differ in how they let you execute entry-point scripts directly. On
Linux and macOS, they’re regular Python files with execute permission (see
Example 2-3). Windows embeds the Python code in a binary file in the
Portable Executable (PE) format—more commonly known as a .exe file.
The binary launches the interpreter with the embedded code.3
Example 2-3. The entry-point script for pydoc
#!/usr/local/bin/python3.11
import pydoc
if __name__ == "__main__":
pydoc.cli()
Request the interpreter from the current environment using a shebang.
Load the module containing the designated entry-point function.
Check that the script wasn’t imported from another module.
Finally, call the entry-point function to start up the program.
NOTE
On Windows, you won’t find IDLE and pydoc in the Scripts directory. IDLE is available
from the Windows Start Menu. Pydoc does not come with an entry-point script—use py
-m pydoc instead.
Most environments also include an entry-point script for pip itself. You
should prefer the more explicit form py -m pip over the plain pip
command though. It gives you more control over the target environment for
the packages you install.
The script directory of a Python installation also contains some executables
that aren’t scripts, such as the interpreter, platform-specific variants of the
interpreter, and the python3.x-config tool used for the build configuration
of extension modules.
Linux distributions may have site packages and script directories under both
/usr and /usr/local. These systems allow only the official package manager
to write to the /usr hierarchy. If you install packages using pip with
administrative privileges, they end up in a parallel hierarchy under
/usr/local. (Don’t do this; use the package manager, the per-user
environment, or a virtual environment instead.)
INSTALLATION SCHEMES
Python describes the layout of environments using installation schemes.
Each installation scheme has a name and the locations of some well-
known directories: stdlib and platstdlib for the standard library,
purelib and platlib for third-party modules, scripts for entry-point
scripts, include and platinclude for headers, and data for data files.
The plat* directories are for platform-specific files like binary
extensions.
The sysconfig module defines installation schemes for the major
operating systems and the different kinds of environments—system-
wide installations, per-user installations, and virtual environments.
Downstream distributions like Debian and Fedora often register
additional installation schemes. The main customer of installation
schemes are package installers like pip, as they need to decide where
the various parts of a Python package should go.
You can print the installation scheme for the current environment using
the command py -m sysconfig. Example 2-4 shows how to list all
available installation schemes. (You’re not expanding configuration
variables like the installation root here; they’re only meaningful within
the current environment.)
Example 2-4. Listing installation schemes
import sysconfig
macOS
Files Windows (framework) Linux
You install a package into the per-user environment using pip install --
user. If you invoke pip outside of a virtual environment and pip finds that
it cannot write to the system-wide installation, it will also default to this
location. If the per-user environment doesn’t exist yet, pip creates it for you.
TIP
The per-user script directory may not be on PATH by default. If you install applications
into the per-user environment, remember to edit your shell profile to update the search
path. Pip issues a friendly reminder when it detects this situation.
Per-user environments are not isolated environments: You can still import
system-wide site packages if they’re not shadowed by per-user modules
with the same name. Likewise, distribution-owned Python applications can
see modules from the per-user environment. Applications in the per-user
environment also aren’t isolated from each other. In particular, they cannot
depend on incompatible versions of another package.
In “Installing Applications with Pipx”, I’ll introduce pipx, which lets you
install applications in isolated environments. It uses the per-user script
directory to put applications onto your search path, but relies on virtual
environments under the hood.
Virtual Environments
When you’re working on a Python project that uses third-party packages,
it’s usually a bad idea to install these packages into the system-wide
environment. There are two main reasons why you want to avoid doing this:
First, you’re polluting a global namespace. Testing and debugging your
projects gets a lot easier when you run them in isolated and reproducible
environments. If two projects depend on conflicting versions of the same
package, a single environment isn’t even an option. Second, your
distribution or operating system may have carefully curated the system-
wide environment. Installing and uninstalling packages behind the back of
its package manager introduces a real chance of breaking your system.
Virtual environments were invented to solve these problems. They’re
isolated from the system-wide installation and from each other. Under the
hood, a virtual environment is a lightweight Python environment that stores
third-party packages and a reference to its parent environment. Packages in
virtual environments are only visible to the interpreter in the environment.
You create a virtual environment with the command py -m venv dir. The
last argument is the location where you want the environment to exist—its
root directory. The directory tree of a virtual environment looks much like a
Python installation, except that some files are missing, most notably the
entire standard library. Table 2-4 shows the standard locations within a
virtual environment.
T
a
b
le
2
-
4
.
S
tr
u
ct
u
r
e
o
f
a
vi
rt
u
a
l
e
n
vi
r
o
n
m
e
n
t
Files Windows Linux and macOS
On Linux and macOS, enter the commands below. There’s no need to spell
out the path to the interpreter if the environment uses the well-known name
.venv. The Python Launcher for Unix selects its interpreter by default.
$ py -m venv .venv
$ py -m pip install httpx
$ py
You might think that the interpreter must somehow hardcode the locations
of the standard library and site packages. That’s actually not how it works.
Rather, the interpreter looks at the location of its own executable and checks
its parent directory for a pyvenv.cfg file. If it finds one, it treats that file as a
landmark for a virtual environment and imports third-party modules from
the site packages directory beneath.
This explains how Python knows to import third-party modules from the
virtual environment, but how does it find modules from the standard
library? After all, they’re neither copied nor linked into the virtual
environment. Again, the answer lies in the pyvenv.cfg file: When you create
a virtual environment, the interpreter records its own location under the
home key in this file. If it later finds itself in a virtual environment, it looks
for the standard library relative to that home directory.
NOTE
The name pyvenv.cfg is a remnant of the pyvenv script which used to ship with Python.
The py -m venv form makes it clearer which interpreter you use to create the virtual
environment—and thus which interpreter the environment itself will use.
While the virtual environment has access to the standard library in the
system-wide environment, it’s isolated from its third-party modules.
Although not recommended, you can give the environment access to those
modules as well, using the --system-site-packages option when creating
the environment. The result is quite similar to the way a per-user
environment works.
How does pip know where to install packages? The short answer is that pip
asks the interpreter it’s running on, and the interpreter derives the location
from its own path—just like when you import a module.6 This is why it’s
best to run pip with an explicit interpreter using the py -m pip idiom. If
you invoke pip directly, the system searches your PATH and may come up
with the entry-point script from a different environment.
Virtual environments come with the version of pip that was current when
Python was released. This can be a problem when you’re working with an
old Python release. Create the environment with the option --upgrade-
deps to ensure you get the latest pip release from the Python Package
Index. This method also upgrades any additional packages that may be pre-
installed in the environment.
NOTE
Besides pip, virtual environments may pre-install setuptools for the benefit of legacy
packages that don’t declare it as a build dependency. This is an implementation detail
and subject to change, so don’t assume setuptools will be present.
Activation Scripts
Virtual environments come with activation scripts in the script directory—
these scripts make it more convenient to use a virtual environment from the
command line, and they’re provided for a number of supported shells and
command interpreters. Here’s the Windows example again, this time using
the activation script:
TIP
You can provide a custom prompt using the option --prompt when creating the
environment. The special value . designates the current directory; it’s particularly useful
when you’re inside a project repository.
On macOS and Linux, you need to source the activation script to allow it to
affect your current shell session. Here’s an example for Bash and similar
shells:
$ source .venv/bin/activate
Environments come with activation scripts for some other shells, as well.
For example, if you use the Fish shell, source the supplied activate.fish
script instead.
On Windows, you can invoke the activation script directly. There’s an
Activate.ps1 script for PowerShell and an activate.bat script for cmd.exe.
You don’t need to provide the file extension; each shell selects the script
appropriate for it.
> venv\Scripts\activate
PowerShell on Windows doesn’t allow you to execute scripts by default, but
you can change the execution policy to something more suited to
development: The RemoteSigned policy allows scripts written on the local
machine or signed by a trusted publisher. On Windows servers, this policy
is already the default. You only need to do this once—the setting is stored in
the registry.
$ deactivate
$ mkdir bin
$ export PATH="$(pwd)/bin:$PATH"
$ py -m venv venvs/black
$ venvs/black/bin/python -m pip install black
Successfully installed black-22.12.0 [...]
Finally, you copy the entry-point script into the directory you created in the
first step—that would be a script named black in the bin or Scripts
directory of the environment:
$ cp venvs/black/bin/black bin
Now you can invoke black even though the virtual environment is not
active:
$ black --version
black, 22.12.0 (compiled: no)
Python (CPython) 3.11.1
On top of this simple idea, the pipx project has built a cross-platform
package manager for Python applications with a great developer
experience.
TIP
If there’s a single Python application that you should install on a development machine,
pipx is probably it. It lets you install, run, and manage all the other Python applications
in a way that’s convenient and avoids trouble.
$ py -m pipx ensurepath
If you don’t already have shell completion for pipx, activate it by following
the instructions for your shell, which you can print with this command:
$ pipx completions
With pipx installed on your system, you can use it to install and manage
applications from the Python Package Index (PyPI). For example, here’s
how you would install Black with pipx:
$ pipx upgrade-all
$ pipx reinstall-all
$ pipx uninstall-all
$ pipx list
The commands above provide all the primitives to manage global developer
tools efficiently, but it gets better. Most of the time, you just want to use
recent versions of your developer tools. You don’t want the responsibility of
keeping the tools updated, reinstalling them on new Python versions, or
removing them when you no longer need them. Pipx allows you to run an
application directly from PyPI without an explicit installation step. Let’s use
the classic Cowsay app to try it:
TIP
Use pipx run [app] as the default method to install and run developer tools from
PyPI. Use pipx install [app] if you need more control over application
environments, for example if you need to install plugins. Replace [app] with the name
of the app.
By default, pipx installs applications on the same Python version that it runs
on itself. This may not be the latest stable version, particularly if you
installed pipx using a system package manager like Apt. I recommend
setting the environment variable PIPX_DEFAULT_PYTHON to the latest stable
Python if that’s the case. Many developer tools you run with pipx create
their own virtual environments; for example, virtualenv, Nox, tox, Poetry,
and Hatch all do. It’s worthwhile to ensure that all downstream
environments use a recent Python version by default.
$ export PIPX_DEFAULT_PYTHON=python3.11 # Linux and macOS (bash)
> setx PIPX_DEFAULT_PYTHON python3.11 # Windows
Under the hood, pipx uses pip as a package installer. This means that any
configuration you have for pip also carries over to pipx. A common use case
is installing Python packages from a private index instead of PyPI, such as a
company-wide package repository. You can use pip config to set the URL
of your preferred package index persistently:
Alternatively, you can set the package index for the current shell session
only. Most pip options are also available as environment variables:
$ export PIP_INDEX_URL=https://example.com
Both methods cause pipx to install applications from the specified index.
In this section, we’ll take a deep dive into the other mechanism that links
programs to an environment: module import, or more specifically, how the
import system locates Python modules for a program. In a nutshell, just like
the shell searches PATH for executables, Python searches sys.path for
modules. This variable holds a list of locations from where Python can load
modules—most commonly, directories on the local filesystem.
The machinery behind the import statement lives in importlib from the
standard library (see “Inspecting modules and packages with importlib”).
The interpreter translates every use of the import statement into an
invocation of the __import__ function from importlib. The importlib
module also exposes an import_module function that allows you to import
modules whose names are only known at runtime.
Having the import system in the standard library has powerful implications:
You can inspect and customize the import mechanism from within Python.
For example, the import system supports loading modules from directories
and from zip archives out of the box. But entries on sys.path can be
anything really—say, a URL or a database query—as long as you register a
function in sys.path_hooks that knows how to find and load modules
from these path entries.
Module Objects
When you import a module, the import system returns a module object, an
object of type types.ModuleType. Any global variable defined by the
imported module becomes an attribute of the module object. This allows
you to access the module variable in dotted notation (module.var) from the
importing code.
Under the hood, module variables are stored in a dictionary in the __dict__
attribute of the module object. (This is the standard mechanism used to
store attributes of any Python object.) When the import system loads a
module, it creates a module object and executes the module’s code using
__dict__ as the global namespace. Somewhat simplified, it invokes the
built-in exec function like this:
exec(code, module.__dict__)
Additionally, module objects have some special attributes. For instance, the
__name__ attribute holds the fully-qualified name of the module, like
email.message. The __spec__ module holds the module spec, which I’ll
talk about shortly. Packages also have a __path__ attribute, which contains
locations to search for submodules.
NOTE
Most commonly, the __path__ attribute of a package contains a single entry: the
directory holding its __init__.py file. Namespace packages, on the other hand, can be
distributed across multiple directories.
Idempotency
Importing modules can have side effects, for example by executing
module-level statements. Caching modules in sys.modules ensures that
these side effects happen only once. The import system also uses locks
to ensure that multiple threads can safely import the same module.
Recursion
Modules can end up importing themselves recursively. A common case
is circular imports, where module a imports module b, and b imports a.
The import system supports this by adding modules to sys.modules
before they’re executed. When b imports a, the import system returns
the (partially initialized) module a from the sys.modules dictionary,
thereby preventing an infinite loop.
Module Specs
Conceptually, importing a module proceeds in two steps. First, given the
fully-qualified name of a module, the import system locates the module and
produces a module spec. The module spec
(importlib.machinery.ModuleSpec) contains metadata about the module
such as its name and location, as well as an appropriate loader for the
module. Second, the import system creates a module object from the
module spec and executes the module’s code. The module object includes
special attributes with most of the metadata from the module spec (see
Table 2-5). These two steps are referred to as finding and loading, and the
module spec is the link between them.
T
a
bl
e
2-
5.
A
tt
ri
b
ut
es
of
M
o
d
ul
es
a
n
d
M
o
d
ul
e
S
p
e
cs
The PathFinder is the central hub of the import machinery. It’s responsible
for every module that’s not embedded into the interpreter, and searches
sys.path to locate it.8 The path finder uses a second level of finder objects
known as path entry finders (importlib.abc.PathEntryFinder), each of
which finds modules under a specific location on sys.path. The standard
library provides two types of path entry finders, registered under
sys.path_hooks:
zipimport.zipimporter to import modules from zip archives
importlib.machinery.FileFinder to import modules from a
directory
NOTE
If you’re curious, you can find the built-in logic for constructing sys.path in the
CPython source code in Modules/getpath.py. Despite appearances, this is not an
ordinary Python module. When you build Python, the code in this file is frozen to
bytecode and embedded in the executable.
When the interpreter starts up, it constructs the module path in two steps.
First, it builds an initial module path using some built-in logic. Most
importantly, this initial path includes the standard library. Second, the
interpreter imports the site module from the standard library. The site
module extends the module path to include the site packages from the
current environment. In this section, we’ll take a look at how the interpreter
constructs the initial module path with the standard library. The next section
explains how the site module appends directories with site packages.
The locations on the initial module path fall into three categories, and they
occur in the order given below:
1. The current directory or the directory of the Python script (if any)
2. The locations in the PYTHONPATH environment variable (if set)
3. The locations of the standard library
WARNING
Unfortunately, having the working directory on sys.path is quite unsafe, as an attacker
(or you, mistakenly) can override the standard library by placing Python files in the
victim’s directory.
The location of the standard library is not hardcoded in the interpreter (see
“Virtual Environments”). Rather, Python looks for landmark files on the
path to its own executable, and uses them to locate the current environment
(sys.prefix) and the Python installation (sys.base_prefix). One such
landmark file is pyvenv.cfg, which marks a virtual environment and points
to its parent installation via the home key. Another landmark is os.py, the
file containing the standard os module: Python uses os.py to discover the
prefix outside of a virtual environment, and to locate the standard library
itself.
Site Packages
The interpreter constructs the initial sys.path early on during initialization
using a fairly fixed process. By contrast, the remaining locations on
sys.path—known as site packages—are highly customizable and under
the responsibility of a Python module named site.
The site module adds the following path entries if they exist on the
filesystem:
User site packages
This directory holds third-party modules from the per-user environment.
It’s in a fixed location that depends on the OS (see “The Per-User
Environment”). On Fedora and some other systems, there are two path
entries, for pure Python modules and extension modules, respectively.
Site packages
This directory holds third-party modules from the current environment,
which is either a virtual environment or a system-wide installation. On
Fedora and some other systems, pure Python modules and extension
modules are in separate directories. Many Linux systems also separate
distribution-owned site packages under /usr from local site packages
under /usr/local.
In the general case, the site packages are in a subdirectory of the standard
library named site-packages. If the site module finds a pyvenv.cfg file on
the interpreter path, it uses the same relative path as in a system installation,
but starting from the virtual environment marked by that file. The site
module also modifies sys.prefix to point to the virtual environment.
The site module provides a few hooks for customization:
.pth files
Within site packages directories, any file with a .pth extension can list
additional directories for sys.path, one directory per line. This works
similar to PYTHONPATH, except that modules in these directories will
never shadow the standard library. Additionally, .pth files can import
modules directly—the site module executes any line starting with
import as Python code. Third-party packages can ship .pth files to
configure sys.path in an environment. Some packaging tools use .pth
files behind the scenes to implement editable installs. An editable install
places the source directory of your project on sys.path, making code
changes instantly visible inside the environment.
Summary
In this chapter, you’ve learned what Python environments are, where to find
them, and how they look on the inside. At the core, a Python environment
consists of the Python interpreter and Python modules, as well as entry-
point scripts to run Python applications. Environments are tied to a specific
version of the Python language.
There are three kinds of Python environments. Python installations are
complete, stand-alone environments with an interpreter and the full
standard library. Per-user environments are annexes to an installation where
you can install modules and scripts for a single user. Virtual environments
are lightweight environments for project-specific modules and scripts,
which reference their parent environment via a pyvenv.cfg file. They come
with an interpreter, which is typically a symbolic link or small wrapper for
the parent interpreter, and with activation scripts for shell integration. Use
the command py -m venv to create a virtual environment.
Finally, you’ve seen how Python uses sys.path to locate modules when
you import them, and how the module path is constructed during interpreter
startup. You’ve also learned how module import works under the hood,
using finders and loaders as well as the module cache. Interpreter discovery
and module import are the key mechanisms that link Python programs to an
environment at runtime.
1 There’s also a pythonw.exe executable that runs programs without a console window, like GUI
applications.
2 For example, the standard ssl module uses OpenSSL, an open-source library for secure
communication.
3 You can also execute a plain Python file on Windows if it has a .py or .pyw file extension—
Windows installers associate these file extensions with the Python Launcher and register them
in the PATHEXT environment variable. For example, Windows installations use this mechanism
to launch IDLE.
4 Framework builds on macOS use a version-specific directory for scripts, as well. Historically,
framework builds pioneered per-user installation before its standardization.
5 You could force the use of symbolic links on Windows via the --symlinks option—but don’t.
There are subtle differences in the way these work on Windows. For example, the File
Explorer resolves the symbolic link before it launches Python, which prevents the interpreter
from detecting the virtual environment.
6 Internally, pip queries the sysconfig module for an appropriate installation scheme, see
“Installation Schemes”. This module constructs the installation scheme using the build
configuration of Python and the location of the interpreter in the filesystem.
7 At the time of writing, pipx caches temporary environments for 14 days.
8 For modules located within a package, the __path__ attribute of the package takes the place
of sys.path.
Chapter 3. Python Packages
In this chapter you’ll learn how to package your Python projects for
distribution. A package is a single file containing an archive of your code
along with metadata that describes it, like the project name and version. You
can install this file into a Python environment using pip, the Python package
installer. You can also upload the package to a repository such as the Python
Package Index (PyPI), a public server operated by the Python community.
Having your package on PyPI means other people can install it, too—they
only need to pass its name to pip install.
NOTE
Python folks use the word package for two distinct concepts. Import packages are
Python modules that contain other modules, typically directories with an __init__.py
file. Distribution packages are archive files for distributing Python software—they are
the subject of this chapter.
Creating a package from your project makes it easy to share your code with
others. Packaging also has a less obvious benefit: Installing your project as
a package makes it a first-class citizen of a Python environment. The
metadata in a package specifies the minimum Python version and any third-
party packages it depends on. Installers ensure the environment matches
these prerequisites; they even install missing project dependencies and
upgrade those whose version doesn’t match the requirements. Once
installed, the package has an explicit link to the environment it’s installed
in. Compare this to running a script from your working directory, which
may well end up on an outdated Python version, or in an environment that
doesn’t have all the dependencies installed.
Figure 3-1 shows the typical lifecycle of a package. Everything starts with a
project: the source code of an application, library, or other piece of software
that you’re going to package for distribution (1). Next, you build a package
from the project, an installable artifact with a snapshot of your project at
this point in time (2). If author and user are the same person, they may
install this package directly into an environment, say, for testing (5). If they
are different people, it’s more practical to upload the package to a package
index (a fancy word for a package repository) (3). Think of a package index
as a file server specifically for software packages, which allows people to
retrieve packages by name and version. Once downloaded (4), a user can
install your package into their environment (5). In real life, tools often
combine downloading and installing, building and installing, and even
building and publishing, into a single command.
Figure 3-1. The Package Lifecycle
An Example Application
Many applications start out as small, ad-hoc scripts. Example 3-1 fetches a
random article from Wikipedia and displays its title and summary in the
console. The script restricts itself to the standard library, so it runs in any
Python 3 environment.
Example 3-1. Displaying an extract from a random Wikipedia article
import json
import textwrap
import urllib.request
API_URL = "https://en.wikipedia.org/api/rest_v1/page/random/summary"
def main():
with urllib.request.urlopen(API_URL) as response:
data = json.load(response)
print(data["title"])
print()
print(textwrap.fill(data["extract"]))
if __name__ == "__main__":
main()
The API_URL constant points to the REST API of the English Wikipedia
—or more specifically, its /page/random/summary endpoint.
The urllib.request.urlopen invocation sends an HTTP GET request
to the Wikipedia API. The with statement ensures that the connection is
closed at the end of the block.
The response body contains the resource data in JSON format.
Conveniently, the response is a file-like object, so the json module can
load it like a file from disk.
The title and extract keys hold the title of the Wikipedia page and a
short plain text extract, respectively. The textwrap.fill function
wraps the text so that every line is at most 70 characters long.
Store this script in a file random_wikipedia_article.py and take it for a spin.
Here’s a sample run:
> py random_wikipedia_article.py
Jägersbleeker Teich
Why Packaging?
Sharing a script like Example 3-1 does not require packaging. You can
publish it on a blog or a hosted repository, or send it to friends by email or
chat. Python’s ubiquity, the “batteries included” approach of its standard
library, and its nature as an interpreted language make this possible. The
Python programming language predates the advent of language-specific
package repositories, and the ease of sharing modules with the world was a
boon to Python’s adoption in the early days.1
Distributing self-contained modules without packaging seems like a great
idea at first: You keep your projects free of packaging cruft. They require no
separate artifacts, no intermediate steps like building, and no dedicated
tooling. But using modules as the unit of distribution also comes with
limitations. Here are the pain points:
Distributing projects composed of multiple modules
At some point, your project will outgrow a (reasonably sized) single-file
module. Once you break it up into multiple files, it becomes more
cumbersome for users to consume your work, and for you to publish it.
Binary extensions
Python modules written in a compiled language like C or Rust require a
build step. Ideally, you’ll distribute pre-built binaries for the common
platforms. You may also publish a source archive as a fallback, with an
automated build step that runs on the end user’s machine during
installation.
Packaging solves all of these problems, and it’s quite easy to add. You drop
a declarative file named pyproject.toml into your project, a standard file that
specifies the project metadata and its build system. In return, you get
commands to build, publish, install, upgrade, and uninstall your package.
In summary, Python packages come with many advantages:
Packaging in a Nutshell
In this section, I’ll take you on a whirlwind tour of Python packaging.
Example 3-2 shows how to package the script from “An Example
Application” with the bare minimum of project metadata—the project name
and version. Place the script and the pyproject.toml file side-by-side in an
otherwise empty directory.
Example 3-2. A minimal pyproject.toml file
[project]
name = "random-wikipedia-article"
version = "0.1"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
TIP
Change the project name to a name that uniquely identifies your project. Projects on the
Python Package Index share a single namespace—their names are not scoped by the
users or organizations owning the projects.
$ py -m pip install .
Processing path/to/project
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: random-wikipedia-article
Building wheel for random-wikipedia-article (pyproject.toml) ... done
Created wheel for random-wikipedia-article: …
Stored in directory: …
Successfully built random-wikipedia-article
Installing collected packages: random-wikipedia-article
Successfully installed random-wikipedia-article-0.1
You can run the script by passing its import name to the -m interpreter
option:
$ py -m random_wikipedia_article
Invoking the script directly only takes a line in the project.scripts
section. Example 3-3 tells the installer to generate an entry-point script
named like the project. The script invokes the main function from the
Python module.
Example 3-3. A pyproject.toml file with an entry-point script
[project]
name = "random-wikipedia-article"
version = "0.1"
[project.scripts]
random-wikipedia-article = "random_wikipedia_article:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
Let’s use pipx to streamline the process of installing the project into a
virtual environment and placing the script on your PATH. (If you activated a
virtual environment above, don’t forget to first deactivate it.)
$ pipx install .
installed package random-wikipedia-article 0.1, installed using Python
3.10.8
These apps are now globally available
- random-wikipedia-article
done!
$ random-wikipedia-article
When you’re making changes to the source code, it saves time to see those
changes reflected in the environment immediately, without repeatedly
installing the project. You could import your modules directly from the
source tree during development. Unfortunately, you’d no longer have the
installer check the requirements of your project, nor would you be able to
access project metadata at runtime.
Editable installs achieve the best of both worlds by installing your package
in a special way that redirects imports to the source tree. You can think of
this mechanism as a kind of “hot reloading” for Python packages. It works
with both pip and pipx:
Once you’ve installed your package in this way, you won’t need to reinstall
it to see changes to the source code—only when you edit pyproject.toml to
change the project metadata or add a third-party dependency.
In Example 3-2, you designated hatchling as the build backend for your
project. You can see from the output above that build used hatchling to
perform the actual package build. In “Build Frontends and Build
Backends”, I’ll explain in more detail how the two tools interact to produce
packaging artifacts.
By delegating the work to hatchling, build creates an sdist and a wheel
for the project (see “Wheels and Sdists”). It then places these packages in
the dist directory.
View at:
https://test.pypi.org/project/random-wikipedia-article/0.1/
Congratulations, you have published your first Python package! Let’s install
the package again, this time from the index instead of the project directory:
Dictionaries are known as tables and come in several equivalent forms. You
can put the key/value pairs on separate lines, preceded by the table name in
square brackets:
[project]
name = "foo"
version = "0.1"
project.name = "foo"
project.version = "0.1"
{
"project": {
"name": "random-wikipedia-article",
"version": "0.1"
},
"build-system": {
"requires": ["hatchling"],
"build-backend": "hatchling.build"
}
}
project
Holds the project metadata (see “Project Metadata”).
tool
Stores configuration for each tool used by the project. For example, the
Black code formatter uses tool.black for its configuration.
Figure 3-2 shows how the build frontend and build backend collaborate to
build a package. First, the build frontend creates a virtual environment, the
build environment. Second, it installs the build dependencies into this
environment—the packages listed under requires, which consist of the
build backend itself as well as, optionally, plugins for that backend. Third,
the build frontend triggers the actual package build by importing and
invoking the build backend interface. This is a module or object declared in
build-backend, which contains a number of functions with well-known
signatures for creating packages and related tasks.
Figure 3-2. Build Frontend and Build Backend
Under the hood, pip performs the equivalent of the following commands
when you install the project from its source directory:
$ py -m venv buildenv
$ buildenv/bin/python -m pip install hatchling
$ buildenv/bin/python
>>> import hatchling.build
>>> hatchling.build.build_wheel("dist")
'random_wikipedia_article-0.1-py2.py3-none-any.whl'
>>>
$ py -m pip install dist/*.whl
The build tool first creates an sdist from the project, and then uses that to
create a wheel. Generally, a pure Python package has a single sdist and a
single wheel for a given release. Binary extension modules, on the other
hand, commonly come in wheels for a range of platforms and
environments.
WHEEL COMPATIBILITY TAGS
Installers select the appropriate wheel for an environment using three
so-called compatibility tags that are embedded in the name of each
wheel file:
Python tag
The target Python implementation
ABI tag
The target application binary interface (ABI) of Python, which
defines the set of symbols that binary extension modules can use to
interact with the interpreter
Platform tag
The target platform, including the processor architecture
numpy-1.24.0-cp311-cp311-macosx_10_9_x86_64.whl
cryptography-38.0.4-cp36-abi3-manylinux_2_28_x86_64.whl
The wheel for NumPy—a fundamental library for scientific computing
—targets a specific Python implementation and version (CPython 3.11),
operating system release (macOS 10.9 and above), and processor
architecture (x86-64).
The wheel for Cryptography—another fundamental library, with an
interface to cryptographic algorithms—demonstrates two ways to
reduce the build matrix for binary distributions: The stable ABI is a
restricted set of symbols that are guaranteed to persist across Python
feature versions (abi3), and the manylinux tag advertises compatibility
with a particular C standard library implementation (glibc 2.28 and
above) across a wide range of Linux distributions.
Let’s peek inside a wheel to a get a feeling for how Python code is
distributed. You can extract wheels using the unzip utility to see the files
installers would place in the site-packages directory. Execute the following
commands in a shell on Linux or macOS, preferably inside an empty
directory. If you’re on Windows, you can follow along using the Windows
Subsystem for Linux (WSL).
$ head -5 attrs-22.2.0.dist-info/METADATA
Metadata-Version: 2.1
Name: attrs
Version: 22.2.0
Summary: Classes Without Boilerplate
Home-page: https://www.attrs.org/
In our example, the wheel contains two import packages named attr and
attrs, as well as a .dist-info directory with administrative files. The
METADATA file contains the core metadata for the package, a standardized
set of attributes that describe the package for the benefit of installers and
other packaging tools. You can access the core metadata of installed
packages at runtime using the standard library:
>>> from importlib.metadata import metadata
>>> metadata("attrs")["Version"]
22.2.0
>>> metadata("attrs")["Summary"]
Classes Without Boilerplate
In the next section, you’ll see how to embed core metadata in your own
packages.
Project Metadata
Build backends write out core metadata fields based on what you specify in
the project table of pyproject.toml. Table 3-2 provides an overview of all
the fields you can use in the project table.
T
a
b
l
e
3
-
2
.
T
h
e
p
r
o
j
e
c
t
t
a
b
l
e
Two fields are essential and mandatory for every package: project.name
and project.version. The project name uniquely identifies the project
itself. The project version identifies a release—a published snapshot of the
project during its lifetime. Besides the name and version, there are a
number of optional fields you can provide, such as the author and license, a
short text describing the project, or third-party packages used by the project
(see Example 3-4).
Example 3-4. A pyproject.toml file with project metadata
[project]
name = "random-wikipedia-article"
version = "0.1"
description = "Display extracts from random Wikipedia articles"
keywords = ["wikipedia"]
readme = "README.md"
license = { text = "MIT" }
authors = [{ name = "Your Name", email = "[email protected]" }]
classifiers = ["Topic :: Games/Entertainment :: Fortune Cookies"]
urls = { Homepage = "https://yourname.dev/projects/random-wikipedia-article" }
requires-python = ">=3.7"
dependencies = ["httpx>=0.23.1", "rich>=12.6.0"]
In the following sections, I’ll take a closer look at the various project
metadata fields.
NOTE
Most project metadata fields correspond to a core metadata field (and sometimes two).
However, their names and syntax differ slightly—core metadata standards predate
pyproject.toml by many years. As a package author, you can safely ignore the details of
this translation and focus on the project metadata.
Naming Projects
The project.name field contains the official name of your project.
[project]
name = "random-wikipedia-article"
Your users specify this name to install the project with pip. This field also
determines your project’s URL on PyPI. You can use any ASCII letter or
digit to name your project, interspersed with periods, underscores, and
hyphens. Packaging tools normalize project names for comparison: all
letters are converted to lowercase, and punctuation runs are replaced by a
single hyphen (or underscore, in the case of package filenames). For
example, Awesome.Package, awesome_package, and awesome-package all
refer to the same project.
Project names are distinct from import names, the names users specify to
import your code. The latter must be valid Python identifiers, so they can’t
have hyphens or periods and can’t start with a digit. They’re case-sensitive
and can contain any Unicode letter or digit. As a rule of thumb, you should
have a single import package per distribution package and use the same
name for both (or a straightforward translation, like random-wikipedia-
article and random_wikipedia_article).
Versioning Projects
The project.version field stores the version of your project at the time
you publish the release.
[project]
version = "0.1"
[project]
dynamic = ["version", "readme"]
NOTE
The goal of the standards behind pyproject.toml is to let projects define their metadata
statically, rather than rely on the build backend to compute the fields during the package
build. This benefits the packaging ecosystem, because it makes metadata accessible to
other tools. It also reduces cognitive overhead because build backends share a unified
configuration format and populate the metadata fields in a straightforward and
transparent way.
__version__ = "0.2"
[tool.hatch.version]
path = "random_wikipedia_article.py"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
This line marks the version field as dynamic.
This line tells Hatch where to look for the __version__ attribute.
The astute reader will have noticed that you don’t really need this
mechanism to avoid duplicating the version. You can also declare the
version in pyproject.toml as usual and read it from the installed metadata at
runtime:
Example 3-6. Reading the version from the installed metadata
from importlib.metadata import version
__version__ = version("random-wikipedia-article")
But don’t go and add this boilerplate to all your projects yet. Reading the
metadata from disk is not something you want to do during program startup.
Third-party libraries like click provide mature implementations that
perform the metadata lookup on demand, under the hood—for example,
when the user specifies a command-line option like --version.
Unfortunately, this is usually not enough to truly single-source the version.
It’s considered good practice to tag releases in your version control system
(VCS) using a command like git tag v1.0.0. Luckily, a number of build
backends come with plugins that extract the version number from Git,
Mercurial, and similar systems. This technique was pioneered by the
setuptools-scm plugin; for Hatch, you can use the hatch-vcs plugin (see
Example 3-7).
Example 3-7. Deriving the project version from the version control system
[project]
name = "random-wikipedia-article"
dynamic = ["version"]
[tool.hatch.version]
source = "vcs"
[build-system]
requires = ["hatchling", "hatch-vcs"]
build-backend = "hatchling.build"
If you build this project from a repository and you’ve checked out the tag
v1.0.0, Hatch will use the version 1.0.0 for the metadata. If you’ve
checked out an untagged commit, Hatch will instead generate a
developmental release like 0.1.dev1+g6b80314.2
Entry-point Scripts
Entry-point scripts are small executables that launch the interpreter from
their environment, import a module and invoke a function (see “Entry-point
Scripts”). Installers like pip generate them on the fly when they install a
package.
The project.scripts table lets you declare entry-point scripts. Specify
the name of the script as the key and the module and function that the script
should invoke as the value, using the format module:function.
[project.scripts]
random-wikipedia-article = "random_wikipedia_article:main"
This declaration allows users to invoke the program using its given name:
$ random-wikipedia-article
[project.gui-scripts]
random-wikipedia-article-gui = "random_wikipedia_article:gui_main"
Entry Points
Entry-point scripts are a special case of a more general mechanism called
entry points. Entry points allow you to register a Python object in your
package under a public name. Python environments come with a registry of
entry points, and any package can query this registry to discover and import
modules, using the function importlib.metadata.entry_points from the
standard library. Applications commonly use this mechanism to support
third-party plugins.
The project.entry-points table contains these generic entry points.
They use the same syntax as entry-point scripts, but are grouped in
subtables known as entry point groups. If you want to write a plugin for
another application, you register a module or object in its designated entry
point group.
[project.entry-points.some_application]
my-plugin = "my_plugin"
You can also register submodules using dotted notation, as well as objects
within modules, using the format module:object:
[project.entry-points.some_application]
my-plugin = "my_plugin.submodule:plugin"
Let’s look at an example to see how this works. Random Wikipedia articles
make for fun little fortune cookies, but they can also serve as test fixtures3
for developers of Wikipedia viewers and similar apps. Let’s turn the app
into a plugin for the Pytest testing framework. (Don’t worry if you haven’t
worked with Pytest yet; I’ll cover testing in depth in Chapter 6.)
Pytest allows third-party plugins to extend its functionality with test fixtures
and other features. It defines an entry point group for such plugins named
pytest11. You can provide a plugin for Pytest by registering a module in
this group. Let’s also add Pytest to the project dependencies.
[project]
dependencies = ["pytest"]
[project.entry-points.pytest11]
random-wikipedia-article = "random_wikipedia_article"
For simplicity, I’ve chosen the top-level module that hosted the main
function in Example 3-1. Next, extend Pytest with a test fixture returning a
random Wikipedia article, as shown in Example 3-8.
Example 3-8. Test fixture with a random Wikipedia article
import json
import urllib.request
import pytest
API_URL = "https://en.wikipedia.org/api/rest_v1/page/random/summary"
@pytest.fixture
def random_wikipedia_article():
with urllib.request.urlopen(API_URL) as response:
return json.load(response)
You can try this out yourself in an active virtual environment in the project
directory:
$ py -m pip install .
$ py -m pytest test_wikipedia_viewer.py
============================= test session starts
==============================
platform darwin -- Python 3.11.1, pytest-7.2.1, pluggy-1.0.0
rootdir: ...
plugins: random-wikipedia-article-0.1
collected 1 item
test_wikipedia_viewer.py F
[100%]
=================================== FAILURES
===================================
____________________________ test_wikipedia_viewer
_____________________________
def test_wikipedia_viewer(random_wikipedia_article):
print(random_wikipedia_article["title"])
print(random_wikipedia_article["extract"])
> assert False
E assert False
test_wikipedia_viewer.py:4: AssertionError
----------------------------- Captured stdout call ---------------------------
--
Halgerda stricklandi
Halgerda stricklandi is a species of sea slug, a dorid nudibranch, a shell-
less
marine gastropod mollusk in the family Discodorididae.
=========================== short test summary info
============================
FAILED test_wikipedia_viewer.py::test_wikipedia_viewer - assert False
============================== 1 failed in 1.10s
===============================
[project]
authors = [{ name = "Your Name", email = "[email protected]" }]
maintainers = [
{ name = "Alice", email = "[email protected]" },
{ name = "Bob", email = "[email protected]" },
]
[project]
description = "Display extracts from random Wikipedia articles"
The project.readme field is typically a string with the relative path to the
file with the long description of your project. Common choices are
README.md for a description written in Markdown format and
README.rst for the reStructuredText format. The contents of this file
appear on your project page on PyPI.
[project]
readme = "README.md"
Instead of a string, you can also specify a table with file and content-
type keys.
[project]
readme = { file = "README", content-type = "text/plain" }
You can even embed the long description in the pyproject.toml file using the
text key.
[project]
readme.text = """
# Display extracts from random Wikipedia articles
[project]
keywords = ["wikipedia"]
[project]
classifiers = [
"Development Status :: 3 - Alpha",
"Environment :: Console",
"Topic :: Games/Entertainment :: Fortune Cookies",
]
PyPI maintains the official registry of classifiers for Python projects. They
are known as Trove classifiers4 and consist of hierarchically organized
labels separated by double colons (see Table 3-4).
T
a
b
l
e
3
-
4
.
T
r
o
v
e
C
l
a
s
si
fi
e
r
s
[project.urls]
Homepage = "https://yourname.dev/projects/random-wikipedia-article"
Source = "https://github.com/yourname/random-wikipedia-article"
Issues = "https://github.com/yourname/random-wikipedia-article/issues"
Documentation = "https://readthedocs.io/random-wikipedia-article"
The License
The project.license field is a table where you can specify your project
license under the text key or by reference to a file under the file key. You
may also want to add the corresponding Trove classifier for the license.
[project]
license = { text = "MIT" }
classifiers = ["License :: OSI Approved :: MIT License"]
I recommend using the text key with a SPDX license identifier such as
“MIT” or “Apache-2.0”.5 The Software Package Data Exchange (SPDX) is
an open standard backed by the Linux Foundation for communicating
software bill of material information, including licenses.
If you’re unsure which open source license to use for your project,
choosealicense.com provides some useful guidance. For a proprietary
project, it’s common to specify “proprietary”. You can also add a special
Trove classifier to prevent accidental upload to PyPI.
[project]
license = { text = "proprietary" }
classifiers = [
"License :: Other/Proprietary License",
"Private :: No Upload",
]
[project]
requires-python = ">=3.7"
Most commonly, people specify the minimum Python version as a lower
bound, using a string with the format >=3.x. The syntax of this field is more
general and follows the same rules as version specifiers for project
dependencies (see Chapter 4).
Tools like Nox and tox make it easy to run checks across multiple Python
versions, helping you ensure that the field reflects reality. As a baseline, I
recommend requiring the oldest Python version that still receives security
updates. You can find the end-of-life dates for all current and past Python
versions on the Python Developer Guide.
There are three main reasons to be more restrictive about the Python
version. First, your code may depend on newer language features—for
example, structural pattern matching was introduced in Python 3.10.
Second, your code may depend on newer features in the standard library—
look out for the “Changed in version 3.x” notes in the official
documentation. Third, it could depend on third-party packages with more
restrictive Python requirements.
Some packages declare upper bounds on the Python version, such as
>=3.7,<4. This practice is discouraged, but depending on such a package
may force you to declare the same upper bound for your own package.
Dependency solvers can’t downgrade the Python version in an
environment; they will either fail or, worse, downgrade the package to an
old version with a looser Python constraint. A future Python 4 is unlikely to
introduce the kind of breaking changes that people associate with the
transition from Python 2 to 3.
WARNING
Don’t specify an upper bound for the required Python version unless you know that your
package is not compatible with any higher version. Upper bounds cause disruption in
the ecosystem when a new version is released.
Dependencies and Optional Dependencies
The remaining two fields, project.dependencies and
project.optional-dependencies, list any third-party packages on which
your project depends. You’ll take a closer look at these fields—and
dependencies in general—in the next chapter.
Summary
Packaging allows you to publish releases of your Python projects, using
source distributions (sdists) and built distributions (wheels). These artifacts
contain your Python modules, together with project metadata, in an archive
format that end users can easily install into their environments. The
standard pyproject.toml file defines the build system for a Python project as
well as the project metadata. Build frontends like pip and build use the
build system information to install and run the build backend in an isolated
environment. The build backend assembles an sdist and wheel from the
source tree and embeds the project metadata. You can upload packages to
the Python Package Index (PyPI) or a private repository, using a tool like
Twine.
1 The Python Package Index (PyPI) did not come about for more than a decade. Even the
venerable Comprehensive Perl Archive Network (CPAN) did not exist in February 1991, when
Guido van Rossum published the first release of Python on Usenet.
2 In case you’re wondering, the +g6b80314 suffix is a local version identifier that designates
downstream changes, in this case using output from the command git describe.
3 Test fixtures set up objects that you need to run repeatable tests against your code.
4 The Trove project was an early attempt to provide an open-source software repository,
initiated by Eric S. Raymond.
5 As of this writing, a Python Enhancement Proposal (PEP) is under discussion that changes the
project.license field to a string using SPDX syntax and introduces a separate
project.license-files key for license files that should be distributed with the package (see
PEP 639).
6 You can also add Trove classifiers for each supported Python version. Some backends backfill
classifiers for you—Poetry does this out of the box for Python versions and project licenses.
Chapter 4. Dependency
Management
API_URL = "https://en.wikipedia.org/api/rest_v1/page/random/summary"
USER_AGENT = "random-wikipedia-article/0.1 (Contact: [email protected])"
def main():
headers = {"User-Agent": USER_AGENT}
print(data["title"])
print()
print(textwrap.fill(data["extract"]))
When creating a client instance, you can specify headers that it should
send with every request—like the User-Agent header. Using the client
as a context manager ensures that the network connection is closed at
the end of the with block.
This line performs two HTTP GET requests to the API. The first one
goes to the random endpoint, which responds with a redirect to the
actual article. The second one follows the redirect.
The raise_for_status method raises an exception if the server
response indicates an error via its status code.
The json method abstracts the details of parsing the response body as
JSON.
def main():
...
console.print(data["title"], style="bold")
console.print()
console.print(data["extract"])
Console objects provide a feature-rich print method for console
output. Setting the console width to 72 characters replaces our earlier
call to textwrap.fill. You’ll also want to disable automatic syntax
highlighting, since you’re formatting prose rather than data or code.
The style keyword allows you to set the title apart using a bold font.
$ py -m venv .venv
$ py -m pip install --editable .
At this point, you may be tempted to install httpx and rich manually into
the environment. Instead, add these packages to the project dependencies in
pyproject.toml. This ensures that whenever users install your project into an
environment, they install httpx and rich along with it.
[project]
name = "random-wikipedia-article"
version = "0.1"
dependencies = ["httpx", "rich"]
...
If you reinstall the project, you’ll see that pip installs the dependencies as
well:
$ py -m pip install --editable .
Version Specifiers
Version specifiers define the range of acceptable versions for a package.
When you add a new dependency, it’s a good idea to include its current
version as a lower bound—unless you know your project is compatible with
older releases (and needs to support them). Update this lower bound
whenever you start relying on newer features of the package.
[project]
dependencies = ["httpx>=0.23.1", "rich>=12.6.0"]
Having lower bounds on your dependencies may not seem to matter much,
as long as you install your package in an isolated environment. Installers
will choose the latest version for all your dependencies if there are no
constraints from other packages. But there are three reasons why you
should care. First, libraries are typically installed alongside other packages.
Second, even applications are not always installed in isolation; for example,
Linux distros may want to package your application for the system-wide
environment. Third, lower bounds help you detect version conflicts in your
own dependency tree early. This can happen if you depend on a recent
release of some package, while one of your other dependencies only works
with older releases of that package.
What about upper bounds—should you guard against newer releases that
might break your project? I recommend avoiding upper bounds unless you
know your project is incompatible with the new version of a dependency
(see “Upper Version Bounds in Python”). Later in this chapter, I’ll talk
about lock files. These request “known good” versions of your
dependencies when deploying services and when running automated
checks. Lock files are a much better solution to dependency-induced
breakage than upper bounds.
If a botched release breaks your project, publish a bugfix release to exclude
that specific broken version:
[project]
dependencies = ["awesome>=1.2,!=1.3.1"]
[project]
dependencies = ["awesome>=1.2,<2"]
WARNING
Excluding versions after the fact has a pitfall that you need to be aware of. Dependency
resolvers can decide to downgrade your project to a version without the exclusion, and
upgrade the dependency anyway.
UPPER VERSION BOUNDS IN PYTHON
Some people routinely include upper bounds in version constraints,
especially for dependencies that follow Semantic Versioning. In this
widely adopted versioning scheme, major versions signal any breaking
changes, or incompatible changes to the public API. As engineers, we
err on the side of safety to build robust products, so at first glance,
guarding against major releases seems like something any responsible
person would do. Even if most of them don’t break your project, isn’t it
better to opt in after you have a chance to test the release?
Unfortunately, upper version bounds quickly lead to unsolvable
dependency conflicts. Python environments (unlike Node.js
environments, in particular) can only contain a single version of each
package. Libraries that put upper bounds on their own dependencies
also prevent downstream projects from receiving security and bug fixes
for those packages. Before adding an upper version bound, I therefore
recommend that you carefully consider the costs and benefits.
What exactly constitutes a breaking change is less defined than it may
seem. For example, should a project increment its major version every
time it drops support for an old Python version? Even in clear cases, a
breaking change will only break your project if it affects the part of the
public API that your project actually uses. By contrast, many changes
that will break your project are not marked by a version number
(they’re simply bugs). In the end, you’ll still rely on automated tests to
discover “bad” versions and deal with them after the fact.
Version specifiers support several operators, as shown in Table 4-1. You can
use the equality and comparison operators you know from Python: ==, !=,
<=, >=, <, and >. The == operator also supports wildcards (*), albeit only at
the end of the version string; in other words, you can require the version to
match a particular prefix, such as 1.2.*. There’s also a === operator to
perform a simple character-by-character comparison, which is best used as
a last resort for non-standard versions. Finally, the compatible release
operator ~= specifies that the version should be greater than or equal to the
given value, while still starting with the same prefix. For example, ~=1.2.3
is equivalent to >=1.2.3,==1.2.*, and ~=1.2 is equivalent to
>=1.2,==1.*.
T
a
b
l
e
4
-
1
.
V
e
r
s
i
o
n
S
p
e
c
if
i
e
r
s
You don’t need upper bounds to exclude pre-releases, by the way. Even
when pre-releases are newer than the latest stable release, version specifiers
exclude them by default due to their expected instability. There are only
three situations where pre-releases are valid candidates: when they’re
already installed, when they’re the only ones satisfying the dependency
specification, and when you request them explicitly, using a clause like
>=1.0.0rc1.
def main():
headers = {"User-Agent": USER_AGENT}
with httpx.Client(headers=headers, http2=True) as client:
...
Under the hood, httpx delegates the gory details of speaking HTTP/2 to
another package (named h2). That dependency is not pulled in by default,
however. This way, users who don’t need the newer protocol get away with
a smaller dependency tree. You do need it here, so activate the optional
feature using the syntax httpx[http2]:
[project]
dependencies = ["httpx[http2]>=0.23.1", "rich>=12.6.0"]
[project.optional-dependencies]
http2 = ["h2>=3,<5"]
brotli = ["brotli"]
Environment Markers
The third piece of metadata you can specify for a dependency is
environment markers. But before I explain what these are, let me show you
an example of where they come in handy.
If you looked at the User-Agent header in Example 4-1 and thought, “I
should not have to repeat the version number in the code”, you’re
absolutely right. As you’ve seen in “Single-Sourcing the Project Version”,
you can read the version of your package from its metadata in the
environment. Example 4-4 shows how you can use the function
importlib.metadata.metadata to construct the User-Agent header from
the Name, Version, and Author-email core metadata fields. These fields
correspond to the name, version, and authors fields in the project
metadata.3
Example 4-4. Using importlib.metadata to build a User-Agent header
from importlib.metadata import metadata
def build_user_agent():
fields = metadata("random-wikipedia-article")
return USER_AGENT.format_map(fields)
def main():
headers = {"User-Agent": build_user_agent()}
...
The metadata function retrieves the core metadata fields for the
package.
The str.format_map function replaces each placeholder using a
lookup in the provided mapping (the core metadata fields).
[project]
dependencies = [
"httpx[http2]>=0.23.1",
"rich>=12.6.0",
"importlib-metadata>=5.2.0; python_version < '3.8'",
]
try:
from importlib.metadata import metadata
except ImportError:
from importlib_metadata import metadata
[project]
dependencies = [""" \
awesome-package; python_full_version <= '3.8.1' \
and (implementation_name == 'cpython' or implementation_name == 'pypy') \
and sys_platform == 'darwin' \
and 'arm' in platform_version \
"""]
I’ve also relied on TOML’s support for multi-line strings here, which uses
triple quotes just like Python. Dependency specifications cannot span
multiple lines, so you have to escape any newlines with a backslash.
T
a
bl
e
4-
2.
E
n
vi
r
o
n
m
e
nt
M
a
rk
er
s
Environment Standard
Marker Library Description Examples
Provides-Extra: http2
Requires-Dist: h2<5,>=3; extra == 'http2'
Development Dependencies
Development dependencies are third-party packages that you require during
development. As a developer, you might use the pytest testing framework
to run the test suite for your project, the Sphinx documentation system to
build its docs, or a number of other tools to help with project maintenance.
Your users, on the other hand, don’t need to install any of these packages to
run your code.
As a concrete example, let’s add a small test for the build_user_agent
function from Example 4-4. Create a module
test_random_wikipedia_article.py in your project directory with the
code from Example 4-5.
Example 4-5. Testing the generated User-Agent header
from random_wikipedia_article import build_user_agent
def test_build_user_agent():
assert 'random-wikipedia-article' in build_user_agent()
Import the function under test, build_user_agent.
Define the test function; pytest looks for functions whose names start
with test.
Use the assert statement to check for the project name in the generated
header.
You could just import and run the test from Example 4-5 manually. But
even for a tiny test like this, pytest adds three useful features. First, you
can run the test by invoking pytest without arguments—and this is true for
any other tests you may add to your project. Second, pytest produces a
summary with the test results. Third, pytest rewrites assertions in your
tests to give you friendly, informative messages when they fail.
Let’s run the test with pytest. Create and activate a virtual environment in
your project, then enter the commands below to install and run pytest
alongside your project:
$ py -m pip install .
$ py -m pip install pytest
$ py -m pytest
========================= test session starts ==========================
platform darwin -- Python 3.11.1, pytest-7.2.1, pluggy-1.0.0
rootdir: ...
plugins: anyio-3.6.2
collected 1 item
test_random_wikipedia_article.py . [100%]
For now, things look great. Tests ensure that your project can evolve
without breaking things, and the test for build_user_agent is a first step
in that direction. Installing and running the test runner is a small
infrastructure cost compared to these long-term benefits.
But how do you ensure that tools like pytest are set up correctly? Each of
your projects is going to have slightly different requirements. You could add
a small note to the project README for your contributors (and your future
self). But eventually there may be more tools: plugins for pytest, tools to
build the documentation, tools to analyze code for common bugs. You could
add those tools to the project dependencies. But that would be wasteful;
your users don’t need those packages to run your code.
Python doesn’t yet have a standard way to declare the development
dependencies of a project. Generally, people use one of three approaches to
fill the gap: optional dependencies, requirements files, or dependency
groups. In this section you’ll learn how to declare development
dependencies using extras and optional dependencies. Requirements files
allow you to list dependencies in a packaging-agnostic way, outside of the
project metadata; I’ll introduce them in the next section. Dependency
groups are a feature of project managers, which I’ll cover in Chapter 5.
Let’s recap why keeping track of development dependencies is helpful:
You don’t need to remember how to set up the development
environment for each project.
You make life easier for any potential contributors as well.
It helps with automating checks, both locally and in Continuous
Integration (CI).
You can make sure you get compatible versions of the tools. Your test
suite may not work with some versions of pytest, and your docs may
not build (or not look good) on all versions of Sphinx.
Extras are groups of optional dependencies that are recorded in the project
metadata (see “Extras and Optional Dependencies”). The extras mechanism
provides all the necessary ingredients to track development dependencies.
The packages aren’t installed by default, they can be grouped under
declarative names like tests or docs, and they come with the full
expressivity of dependency specifications, such as version constraints and
environment markers. Example 4-6 shows how you can use extras to
represent the development dependencies of a project.
Example 4-6. Using extras to represent development dependencies
[project.optional-dependencies]
tests = ["pytest>=7.2.1", "pytest-sugar>=0.9.6"]
docs = ["sphinx>=6.1.3"]
I’ve added pytest-sugar to the tests extra, which is a pytest plugin that
formats the test output in a nice way. There’s also a docs extra for building
documentation with Sphinx; I’ve added it to demonstrate that you can have
multiple groups of dependencies, but you won’t be using it in this chapter.
Contributors can now install the test dependencies using the tests extra:
You can also provide a dev extra that combines all the development
dependencies, to make it easy to set up a local development environment.
Instead of repeating all the dependencies, you can just reference the other
extras, as shown in Example 4-7:4
Example 4-7. Providing a dev extra with all development dependencies
[project.optional-dependencies]
tests = ["pytest>=7.2.1", "pytest-sugar>=0.9.6"]
docs = ["sphinx>=6.1.3"]
dev = ["random-wikipedia-article[tests,docs]"]
Requirements Files
Unlike extras, requirements files are dependency specifications that aren’t
part of the project metadata. You share them with your contributors using
the version control system, not with your users using distribution packages,
which is a good thing. What’s more, requirements files don’t implicitly
include your project in the dependencies. That shaves off time from all
tasks that don’t need the project installed, such as documentation builds and
linting.
At their core, requirements files are plain text files where each line is a
dependency specification (see Example 4-8).
Example 4-8. A simple requirements.txt file
# requirements.txt
pytest>=7.2.1
pytest-sugar>=0.9.6
sphinx>=6.1.3
You can install the dependencies listed in a requirements file using pip:
The file format is not standardized; in fact, each line of a requirement file is
essentially an argument for pip install. In addition to dependency
specifications (which are standardized), it can have URLs and file paths,
optionally prefixed by -e for an editable install, as well as global options
such as -r to include another requirements file. The file format also
supports Python-style comments (with a leading # character) and line
continuations (with a trailing \ character).
Requirements files are commonly named requirements.txt, but variations
are common. For example, you could have a dev-requirements.txt for
development dependencies or a requirements directory with one file per
dependency group. Let’s replicate Example 4-7 using the third option:
Example 4-9. Using requirements files to specify development dependencies
# requirements/tests.txt
-e .
pytest>=7.2.1
pytest-sugar>=0.9.6
# requirements/docs.txt
sphinx>=6.1.3
# requirements/dev.txt
-r tests.txt
-r docs.txt
NOTE
Paths in requirements.txt are evaluated relative to the current directory. However, if you
include other requirement files using -r, their paths are evaluated relative to the
including file.
Create and activate a virtual environment, then run the following commands
to install the development dependencies and run the test suite:
Locking Dependencies
You’ve specified your dependencies and development dependencies,
installed them in a development environment, run your test suite and
whichever other checks you have in place: everything looks good, and
you’re ready to deploy your code to production.
There’s just one little hitch. How can you be sure that you install the same
dependencies in production as you did when you ran your checks? The
more exposure your production code gets, the more worrying the possibility
that it might run with a buggy or, worse, hijacked dependency. It could be a
direct dependency of your project or a package deeper down in the
dependency tree—an indirect dependency.
Even if you take care to upgrade your dependencies to the latest version
when testing, a new release could come in just before you deploy. You can
also end up with different dependencies if your development environment
does not match the production environment exactly: the mismatch can
cause installers to evaluate environment markers and wheel compatibility
tags differently.6 Tooling configuration or state can also cause different
results—for example, pip might install from a different package index or
from a local cache.
The problem is compounded if one of your dependencies doesn’t provide
wheels for the target environment—and it’s common for binary extension
modules to lag behind when a new Python version sees the light. The
installer must then build a wheel from the sdist on the fly, which introduces
more uncertainty: your installs are now only as reproducible as your builds.
And in the worst case, that package could compute its own dependencies
dynamically during build time.
Presumably, somewhere in your deployment process, there’s a line like this:
The installer will honor all version constraints from the dependencies
table in pyproject.toml. But as you saw above, it won’t select the same
packages on every run and every system. You need a way to define the
exact set of packages required by your application, and you want its
environment to be an exact image of this package inventory. This process is
known as pinning, or locking, the project dependencies.
What if you replace each version range in pyproject.toml with a single
version? Here’s how that would look like for random-wikipedia-article:
[project]
dependencies = [
"httpx[http2]==0.23.3",
"rich==13.3.1",
"importlib-metadata==6.0.0; python_version < '3.8'",
]
There are a couple of problems with this approach. First, you’ve only
pinned the direct dependencies. The application communicates via HTTP/2
using h2, a dependency of httpx—but h2 isn’t mentioned in the list above.
Should you add indirect dependencies to the dependencies table? That list
would quickly become hard to maintain. And you’d start to rely on
implementation details of the packages you actually import in your code.
Second, you’ve lost valuable information about the packages with which
your application is compatible. Pip’s dependency resolver used that
information to compute the versions above in the first place, but you won’t
have it the next time you want to upgrade your dependencies. Losing that
information also makes it that much harder to install the application in a
different environment—for example, when your production environment
upgrades to a new Python release.
$ py -m pip install .
$ py -m pip freeze
anyio==3.6.2
Brotli==1.0.9
certifi==2022.12.7
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==0.16.3
httpx==0.23.3
hyperframe==6.0.1
idna==3.4
markdown-it-py==2.1.0
mdurl==0.1.2
Pygments==2.14.0
random-wikipedia-article @ file:///path/to/project
rfc3986==1.5.0
rich==13.3.1
sniffio==1.3.0
You could store this list in requirements.txt and add the file to your project
repository—omitting the line with the project path. When deploying your
project to production, you could install the project and its dependencies like
this:
If you’ve paid close attention, you may have noticed that the requirements
file didn’t list importlib-metadata. That’s because you ran pip freeze
in an environment with a recent Python version—importlib-metadata is
only required on Python 3.7 and below. If your production environment
uses such an old version of Python, your deployment will break: you need
to lock your dependencies in an environment that matches production.
TIP
Lock your dependencies on the same Python version, Python implementation, operating
system, and processor architecture as those used in production. If you deploy to multiple
environments, generate a requirements file for each one, using a naming convention like
win32-py311-requirements.txt.
httpx==0.23.3 \
--
hash=sha256:9818458eb565bb54898ccb9b8b251a28785dd4a55afbc23d0eb410754fe7d0f9 \
--
hash=sha256:a211fcce9b1254ea24f0cd6af9869b3d29aba40154e947d2a07bb499b3e310d6
$ py -m piptools compile
You can also install pip-tools globally using pipx—but the same caveat
applies. The pipx-managed environment must closely match the target
environment for the requirements file. Specify the Python version and
implementation using the --python option of pipx install. Additionally,
use the --suffix option to rename the entry-point scripts with a suffix
indicating the interpreter on which they run. This will save you a headache
when one of your projects or environments needs a different interpreter than
the others—and it allows you to install multiple versions of pip-tools
globally.
For example, here’s how you’d install pip-tools for a Python 3.7
environment using PyPy. (This example assumes that your system has a
pypy3.7 command.)
This gives you a global command to compile requirements for any project
using pypy3.7:
$ pip-compile-pypy3.7
$ py -m piptools compile \
--resolver=backtracking --allow-unsafe --no-header --no-
annotate
anyio==3.6.2
brotli==1.0.9
certifi==2022.12.7
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==0.16.3
httpx[brotli,http2]==0.23.3
hyperframe==6.0.1
idna==3.4
markdown-it-py==2.1.0
mdurl==0.1.2
pygments==2.14.0
rfc3986[idna2008]==1.5.0
rich==13.3.1
sniffio==1.3.0
As before, install the requirements file in the target environment using pip,
followed by the name of the package itself. There are a couple of pip
options you can use to harden the installation: the option --no-deps
ensures that you only install packages listed in the requirements file, and the
option --no-cache-dir prevents pip from reusing downloaded or locally
built artifacts.
If you don’t want to create the target environment from scratch, you can use
pip-tools to synchronize it with the updated requirements file. Don’t install
pip-tools in the target environment for this, as your dependencies may
conflict with those of pip-tools. Instead, use pipx to install pip-tools
globally, then specify the --python-executable option to point it to the
target environment:
This example assumes that the target environment is under venv in the
current directory. On Windows, the interpreter path would be
venv\Scripts\python.exe instead. Also, pip-sync always removes the
project itself, so remember to re-install it after synchronizing the
dependencies.
So far, you’ve seen how to lock dependencies for reliable and reproducible
deployments, but locking is also beneficial during development. By sharing
the requirements file with your team and with contributors, you put
everybody on the same page: every developer uses the same dependencies
when running the test suite, building the documentation, or performing
similar tasks. And by using the same requirements during Continuous
Integration, you avoid surprises when developers publish their changes to a
shared repository. However, to truly reap these benefits, you’ll need to
widen your view to include development dependencies.
In “Development Dependencies”, you saw two ways to declare
development dependencies: extras and requirements files. Pip-tools
supports both as inputs. That’s right: you can use requirements files as
inputs for other requirements files. But let’s start with extras.
You can pass an extra to pip-compile using, well, the --extra option. If
your project has a dev extra, generate the requirements file for development
like this:
If you have finer-grained extras, the process is the same. You may want to
store the requirements files in a requirements directory instead to avoid
clutter.
WARNING
Unfortunately, pip-tools doesn’t currently support recursive extras like the dev extra in
Example 4-7. If you run into this limitation, either specify the development
dependencies in requirements files or duplicate the dependencies from the other extras.
If you specify development dependencies in requirements files outside of
the project metadata, pass each of these files to pip-tools in turn. By
convention, input requirements use the .in extension, while output
requirements—the locked ones—use the .txt extension. If you follow the
requirements.in naming convention, pip-tools will derive the names of the
output files as appropriate.
Example 4-10 shows how you’d set this up for the tests, docs, and dev
requirements from Example 4-9.
Example 4-10. Using requirements files to specify development
dependencies
# requirements/tests.in
pytest>=7.2.1
pytest-sugar>=0.9.6
# requirements/docs.in
sphinx>=6.1.3
# requirements/dev.in
-r tests.in
-r docs.in
Unlike in Example 4-9, I haven’t included the project itself in the input
requirements. If I did, pip-tools would insert the full project path, which
may not be the same for every developer. Instead, pass pyproject.toml
together with tests.in and dev.in to lock the entire set of dependencies
together. You’ll also have to specify the output file explicitly if you pass
more than a single input file. When installing from the resulting files,
remember to install the project as well.
You may wonder why I bothered to compile dev.txt at all. Couldn’t I have
just referenced the generated docs.txt and tests.txt files? In fact, it’s essential
to let the dependency resolver see the full picture—all the input
requirements. If you simply install separately locked requirements on top of
each other, you may well end up with conflicting dependencies.
Table 4-4 summarizes the command-line options for pip-compile you’ve
seen in this chapter:
T
a
b
le
4
-
4
.
S
el
e
ct
e
d
c
o
m
m
a
n
d
-
li
n
e
o
p
ti
o
n
s
f
o
r
p
i
p
-
c
o
m
p
i
l
e
Option Description
Summary
In this chapter, you’ve learned how to declare the project dependencies
using pyproject.toml and how to declare development dependencies using
either extras or requirements files. You’ve also learned how to lock
dependencies for reliable deployments and reproducible checks using pip-
tools. In the next chapter, you’ll see how the project manager Poetry helps
with dependency management using dependency groups and lock files.
1 In a wider sense, the dependencies of a project consist of all software packages that users
require to run its code. This includes the interpreter, the standard library, third-party modules,
and system libraries. Conda supports this generalized notion of dependencies, and so do distro-
level package managers like apt, dnf, and brew.
2 Its counterpart also has a name: “Look Before You Leap” (LBYL).
3 For the sake of simplicity, this code doesn’t handle multiple authors—which one ends up in
the header is undefined.
4 This technique is sometimes called recursive optional dependencies.
5 For completeness, there’s a fourth way to handle development dependencies, and that’s not to
declare them as project dependencies at all. Instead, you automate the environment creation
using a tool like Nox, tox, or Hatch and include the dependencies as part of that. Chapter 6
covers test automation with Nox in detail.
6 See “Environment Markers” and “Wheel Compatibility Tags”.
Chapter 5. Managing Projects
with Poetry
In the preceding chapters, you’ve seen all the building blocks for publishing
production-quality Python packages. You’ve learned how to write a
pyproject.toml for your project, how to create an environment and install
the project with venv and pip, and how to build packages and upload them
with build and twine.
By standardizing the build backend interface and project metadata,
pyproject.toml has broken the setuptools monopoly and brought diversity to
the packaging ecosystem. At the same time, defining a Python package has
gotten easier: the legacy boilerplate of setup.py and untold configuration
files is gone, replaced with a single well-specified file with great tooling
support.
Yet, some problems remain.
Before you can work on a pyproject.toml-based project, you need to
research packaging workflows, configuration files, and associated tooling.
You have to choose one of a number of available build backends (see
“Build Frontends and Build Backends”)—and many people don’t know
what those are, let alone how to choose them. Important aspects of Python
packages remain unspecified—for example, how project sources are laid
out and which files should go into the packaging artifacts.
Dependency and environment management could be easier, too. You need
to handcraft your dependency specifications and compile them with pip-
tools, cluttering your project with requirements files. And it can be hard to
keep track of the many Python environments on a typical developer system.
The project-management tool Poetry has been addressing these problems
since (and even before) the standards governing pyproject.toml were taking
shape. Its friendly command-line interface lets you perform most of the
tasks related to packaging, dependencies, and environments. Poetry brings
its own standards-compliant build backend, poetry.core—but you can
remain blissfully unaware of this fact. It also comes with a strict
dependency resolver and locks all dependencies by default, behind the
scenes.
Poetry abstracts away many of the details I’ve covered in the preceding
three chapters. Still, learning about packaging standards and the low-level
tooling that implements them is well worth your while. Poetry itself largely
works in the framework defined by packaging standards, even though it
also ventures into new territory. Standard mechanisms like dependency
specifications or virtual environments power Poetry’s central features, and
Poetry-managed projects leverage interoperability standards when
interacting with package repositories, build frontends, and installers. An
understanding of the underlying mechanisms helps you debug situations
where Poetry’s convenient abstractions break down, such as when
misconfigurations or bugs cause packages to be installed in the wrong
environment. Finally, the experience of the past decades teaches us that
packaging tools come and go, while packaging standards are here to stay.
THE EVOLUTION OF PYTHON PROJECT MANAGERS
A decade ago, Python packaging was firmly in the hands of three tools:
setuptools, virtualenv, and pip. You’d use setuptools to create Python
packages, virtualenv to set up virtual environments, and pip to install
packages into them. Everybody did. Around 2016—the same year that
the pyproject.toml file became standard—things started to change.
In 2015, Thomas Kluyver began developing flit, an alternative build
tool that could create packages and publish them to PyPI. In 2016,
Donald Stufft from the pip maintainer team started working on Pipfile,
a proposed replacement for requirements files, including a specification
of lock files. In 2017, his work led to Kenneth Reitz’s pipenv, which
allows you to manage dependencies and environments for Python
applications and deploy them in a reproducible way. Pipenv deliberately
didn’t package your application: you’d just keep a bunch of Python
modules in a Git repository.
Poetry, started in 2018 by Sébastien Eustace, was the first tool to
provide a unified approach to packaging, dependencies, and
environments—and its adoption quickly spiraled. Two other tools
follow a similarly holistic approach: PDM, started by Frost Ming in
2019, and Hatch by Ofek Lev in 2017. Hatch has recently grown in
popularity, especially among tooling and library developers.
Poetry, Hatch, and PDM each provide a single user interface and a
streamlined workflow for Python packaging as well as for environment
and dependency management. As such, they have come to be known as
Python project managers.
Installing Poetry
Install Poetry globally using pipx, to keep its dependencies isolated from
the rest of the system:
$ pipx install poetry
You can omit the --python option if pipx already uses the new Python
version (see “Installing Applications with Pipx”).
When a prerelease of Poetry becomes available, you can install it side-by-
side with the stable version:
Above, I’ve used the --suffix option to rename the command so you can
invoke it as poetry@preview, while keeping poetry as the stable version.
The --pip-args option lets you pass options to pip, like --pre for
including prereleases.
NOTE
Poetry also comes with an official installer. You can download the installer and run it
with Python. It’s not as flexible as pipx, but it provides a simple and readily available
alternative.
Type poetry on its own to check your installation of Poetry. Poetry prints
its version and usage to the terminal, including a useful listing of all
available subcommands.
$ poetry
Creating a Project
You can create a new project using the command poetry new. As an
example, I’ll use the random-wikipedia-article project from previous
chapters. Run the following command in the parent directory where you
want to keep your new project:
After running this command, you’ll see that Poetry created a project
directory named random-wikipedia-article, with the following structure:
random-wikipedia-article
├── README.md
├── pyproject.toml
├── src
│ └── random_wikipedia_article
│ └── __init__.py
└── tests
└── __init__.py
[tool.poetry.dependencies]
python = "^3.11"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Poetry has created a standard build-system table with its build backend,
poetry.core. This means anybody can install your project from source
using pip—no need to set up, or even know about, the Poetry project
manager. Similarly, you can build packages using any standard build
frontend, such as build. Try it yourself from within the project directory:
[tool.poetry.urls]
Issues = "https://github.com/yourname/random-wikipedia-article/issues"
[tool.poetry.scripts]
random-wikipedia-article = "random_wikipedia_article:main"
The license field is a string with a SPDX identifier, not a table.
The authors field contains strings in the format name <email>, not
tables. Poetry pre-populates the field with your name and email using
the local Git configuration.
The readme field is a string with the file path. You can also specify
multiple files as an array of strings, such as README.md and
CHANGELOG.md. Poetry concatenates them with a blank line in
between.
Poetry has dedicated fields for some project URLs, namely its
homepage, repository, and documentation; for other URLs, there’s also
a generic urls table.
T
a
bl
e
5-
1.
M
et
a
d
at
a
fi
el
d
s
in
t
o
o
l
.
p
o
e
t
r
y
Related project
Field Type Description field
name string The project name name
$ poetry check
All set!
Each table under packages has an include key with a file or directory. You
can use * and ** wildcards in their names and paths, respectively. The from
key allows you to include modules from subdirectories such as src. Finally,
you can use the format key to restrict modules to a specific distribution
format; valid values are sdist and wheel.
The include and exclude fields allow you to list other files to include in,
or exclude from, the distribution. Poetry seeds the exclude field using the
.gitignore file, if present. Instead of a string, you can also use a table with
path and format keys for sdist-only or wheel-only files. Example 5-3
shows how to include the test suite in source distributions.
Example 5-3. Including the test suite in source distributions
packages = [{include = "random_wikipedia_article", from = "src"}]
include = [{path = "tests", format = "sdist"}]
try:
from importlib.metadata import metadata
except ImportError:
from importlib_metadata import metadata
API_URL = "https://en.wikipedia.org/api/rest_v1/page/random/summary"
USER_AGENT = "{Name}/{Version} (Contact: {Author-email})"
def main():
fields = metadata("random-wikipedia-article")
headers = {"User-Agent": USER_AGENT.format_map(fields)}
main()
Managing Dependencies
Let’s add the dependencies for random-wikipedia-article, starting with
Rich, the console output library:
Updating dependencies
Resolving dependencies... (0.2s)
If you inspect pyproject.toml after running this command, you’ll find that
Poetry has added Rich to the dependencies table (Example 5-6):
Example 5-6. The dependencies table after adding Rich
[tool.poetry.dependencies]
python = "^3.11"
rich = "^13.3.1"
Poetry also created a virtual environment for the project and installed the
package into it (see “Managing Environments”).
Caret Constraints
The caret (^) is a Poetry-specific extension borrowed from npm, the
package manager for Node.js. Caret constraints allow releases with the
given minimum version, except those that may contain breaking changes
according to the Semantic Versioning standard. After 1.0.0, this means
patch and minor releases, but not major releases. Before 1.0.0, only patch
releases are allowed—this is because in the 0.* era, even minor releases are
allowed to introduce breaking changes.
Caret constraints are similar to tilde constraints (see “Version Specifiers”),
but the latter only allow the last version segment to increase. For example,
the following constraints are equivalent:
rich = "^13.3.1"
rich = ">=13.3.1,<14"
rich = "~13.3.1"
rich = ">=13.3.1,==13.3.*"
SHOULD YOU CAP DEPENDENCIES?
It’s unfortunate that Poetry adds upper bounds to dependencies by
default. For libraries, the practice prevents downstream users from
receiving fixes and improvements, since constraints aren’t scoped to the
packages that introduce them, as in Node.js. Many open-source projects
don’t have the resources to backport fixes to past releases. For
applications, dependency locking provides a better way to achieve
reliable deployments.
The situation is similar but worse for the Python requirement.
Excluding Python 4 by default will cause disruption across the
ecosystem when the core Python team eventually releases a new major
version. It’s unlikely that Python 4 will come anywhere near Python 3
in terms of incompatible changes. Poetry’s constraint is contagious in
the sense that dependent packages must also introduce it. And it’s
impossible for Python package installers to satisfy—they can’t
downgrade the environment to an earlier version of Python.
Whenever possible, replace caret constraints in pyproject.toml with
simple lower bounds (>=), especially for Python itself. Afterward,
refresh the lock file using the command poetry lock --no-update.
[tool.poetry.dependencies]
awesome = {version = ">=1", markers = "implementation_name == 'pypy'"}
Use the command poetry show to display the locked dependencies on the
terminal. Here’s what the output looked like after I added Rich:
$ poetry show
markdown-it-py 2.2.0 Python port of markdown-it. Markdown parsing, done
right!
mdurl 0.1.2 Markdown URL utilities
pygments 2.14.0 Pygments is a syntax highlighting package written in
Python.
rich 13.3.1 Render rich text, tables, progress bars, ...
Updating Dependencies
You can update all dependencies in the lock file to their latest versions
using a single command:
$ poetry update
This bumps the lower bound of the caret constraint to the latest version of
Rich. You can also specify the new constraint yourself after the @-sign:
This is a handy method for removing an upper bound that also keeps the
lock file and project environment up-to-date.
If you no longer need a package for your project, you can remove it using
poetry remove:
$ poetry shell
With the environment activated, you can run the application from the shell
prompt. But first, you need to install it into the environment. Poetry
performs an editable install of the project, so the environment reflects any
code changes immediately.
(random-wikipedia-article-py3.11) $ random-wikipedia-article
(random-wikipedia-article-py3.11) $ exit
You can also run the application in your current shell session, using the
command poetry run:
The poetry run command is also handy for starting an interactive Python
session:
[tool.poetry.dependencies]
python = "^3.7"
Refresh the lock file to bring it in sync with the updated pyproject.toml:
Now you’re ready to create and activate the environment for Python 3.7:
Instead of a version like 3.7, you could also specify a command like pypy3
for the PyPy implementation, or a full path like /usr/bin/python3 for the
system Python.
Finally, install the project into the new environment:
$ poetry install
Installing dependencies from lock file
• ...
• Installing zipp (3.14.0)
• Installing httpx (0.23.3)
• Installing importlib-metadata (6.0.0)
• ...
Use the command poetry env info --path to display the location of the
current environment. By default, Poetry creates virtual environments in a
shared folder. There’s a configuration setting to keep each environment in a
.venv directory inside its project instead:
Even without this setting, Poetry uses the .venv directory in the project if it
already exists.
When you no longer need an environment, remove it like this:
Development Dependencies
Poetry allows you to declare development dependencies, organized in
groups. Development dependencies are not part of the project metadata and
are invisible to end users. Let’s add a dependency group for testing:
Poetry has added the dependency group under the group table in
pyproject.toml:
[tool.poetry.group.tests.dependencies]
pytest = "^7.2.1"
You’re in for a surprise if you try to add a docs group with Sphinx, the
documentation generator. Sphinx has dropped support for Python 3.7, so
Poetry kindly refuses to add it to your dependencies. You could drop Python
3.7 yourself, but Poetry suggests another option—you can restrict Sphinx to
Python 3.8 and newer:
Option Description
Package Repositories
You can build Poetry-managed projects using standard tooling like build,
or you can use the Poetry command-line interface:
$ poetry build
Building random-wikipedia-article (0.1.0)
- Building sdist
- Built random_wikipedia_article-0.1.0.tar.gz
- Building wheel
- Built random_wikipedia_article-0.1.0-py3-none-any.whl
$ poetry publish
Publishing random-wikipedia-article (0.1.0) to PyPI
- Uploading random_wikipedia_article-0.1.0-py3-none-any.whl 100%
- Uploading random_wikipedia_article-0.1.0.tar.gz 100%
You can now specify the repository when publishing your project:
The command will prompt you for the password and store it in the system
keyring, if available, or in the auth.toml file on disk. Alternatively, you can
also configure credentials via environment variables:
export POETRY_PYPI_TOKEN_PYPI=<token>
export POETRY_HTTP_BASIC_PYPI_USERNAME=<username>
export POETRY_HTTP_BASIC_PYPI_PASSWORD=<password>
Poetry also supports repositories that are secured by mutual TLS or use a
custom certificate authority; see the official documentation for details.
So far, I’ve talked about how Poetry supports custom repositories on the
publisher side—how to upload your package to a repository other than
PyPI. Poetry also supports custom repositories on the consumer side; in
other words, you can add packages from other repositories to your project.
While upload targets are a per-user setting, alternative package sources are
a per-project setting and stored in pyproject.toml.
You configure credentials for package sources just like you do for
repositories:
WARNING
If you use secondary package sources, make sure to specify the source when adding a
dependency. If you don’t, Poetry searches all package sources when looking up the
package. This isn’t just inefficient; it opens the door for a so-called dependency
confusion attack, where an attacker uploads a malicious package to PyPI with the same
name as an internal package.
Use the command poetry source show to list the package sources for the
current project:
If you no longer need the plugin, remove it from the injected packages:
In this section, I’ll introduce you to three useful plugins for Poetry:
poetry-plugin-export allows you to generate requirements and
constraints files
poetry-plugin-bundle lets you deploy the project to a virtual
environment
poetry-dynamic-versioning populates the project version from the
VCS
Distribute the requirements file to the target system and use pip to install
the dependencies (typically followed by installing a wheel of your project).
You can test the environment by activating it and invoking the application
or invoke the entry-point script directly. (Replace bin with Scripts if you’re
on Windows.)
$ production/bin/random-wikipedia-article
The bundle plugin is a great way to create a minimal Docker image for
production (Example 5-8). Docker supports multi-stage builds, where you
have a full-fledged build environment for your project in the first stage—
including tools like Poetry or even a compiler toolchain for binary
extension modules—but only a minimal runtime environment in the second
stage. This allows you to ship slim images to production, greatly speeding
up deployments and reducing bloat in your production environments.
Example 5-8. Multi-stage Dockerfile with Poetry
FROM python:3.11 AS builder
RUN python -m pip install pipx
ENV PATH="/root/.local/bin:${PATH}"
RUN pipx install poetry
RUN pipx inject poetry poetry-plugin-bundle
WORKDIR /src
COPY . .
RUN poetry bundle venv --python=/usr/local/bin/python /venv
FROM python:3.11
COPY --from=builder /venv /venv
CMD ["/venv/bin/random-wikipedia-article"]
The first FROM directive introduces the build stage, where you build and
install your project.
The second FROM directive defines the image that you deploy to
production.
The COPY directive allows you to copy the virtual environment over
from the build stage.
The CMD directive lets you run the entry-point script when users invoke
docker run with the image.
If you have Docker installed, you can try this by creating a Dockerfile with
the contents from Example 5-8 in your project and running the following
commands from the project directory:
After the second command, you should see the output from random-
wikipedia-article in your terminal.
[build-system]
requires = ["poetry-core>=1.0.0", "poetry-dynamic-versioning"]
build-backend = "poetry_dynamic_versioning.backend"
In the tool section, configure the plugin to derive the version from the
VCS:
[tool.poetry-dynamic-versioning]
enable = true
vcs = "git"
style = "semver"
Poetry still requires the version key in its own section. You should set it to
0 to indicate that the key is unused.
[tool.poetry]
version = "0"
You can now add a Git tag to set your project version:
Summary
Poetry provides a unified workflow to manage packaging, dependencies
and environments. Poetry projects are interoperable with standard tooling:
you can build them with build and upload them to PyPI with twine. But
the Poetry command-line interface also provides handy commands for these
tasks and many more.
Poetry records the precise working set of packages in its lock file, giving
you deterministic deployments and checks, as well as a consistent
experience when collaborating with others. Poetry can also track
development dependencies for you and organizes them in dependency
groups that can be installed separately or together, as desired. You can
extend Poetry with plugins—for example, to bundle the project into a
virtual environment for deployment or to derive the version number from
Git.
If you need reproducible deployments for an application, if your team
develops on multiple operating systems, or if you just feel that standard
tooling adds too much overhead to your workflows, you should give Poetry
a try.
1 Apart from Poetry’s own poetry.lock and the closely related PDM lock file format, there’s
pipenv’s Pipfile.lock and the conda-lock format for Conda environments.
Chapter 6. Testing with pytest
If you think back to when you wrote your first programs, you may recall a
common, recurring experience: You had an idea for how a program could
help with a real-life task, spent a sizable amount of time coding the program
from top to bottom, only to be confronted with screens full of disheartening
error messages when you finally ran it. Or worse, it gave you results that
were sometimes subtly wrong.
There are a few lessons we’ve all learned from experiences like this. One is
to start simple, and to keep it simple as you iterate on the program. Another
lesson is to test early and repeatedly. Initially, this may just mean to run the
program manually and validate that it does what it should. Later on, if you
break the program into smaller parts, you can test those parts in isolation
and in an automated fashion. As a side effect, the program gets easier to
read and work on, too.
In this chapter, I’ll talk about how testing can help you produce value early
and consistently. Good tests amount to an executable specification of the
code you own. They set you free from tribal knowledge in a team or
company, and they speed up your development by giving you immediate
feedback on changes. The chapter focuses on the tooling side of things, but
there’s so much more to good testing practices. Luckily, other people have
written fantastic texts about this topic. Here are three of my personal
favorites:
Writing a Test
Example 6-1 revisits the original Wikipedia example from Chapter 3. The
program is as simple as it gets—and yet, it’s far from obvious how you’d
write tests for it. The main function has no inputs and no outputs—only side
effects such as writing to the standard output stream. How would you test a
function like this?
Example 6-1. The random_wikipedia_article module
def main():
with urllib.request.urlopen(API_URL) as response:
data = json.load(response)
print(data["title"])
print()
print(textwrap.fill(data["extract"]))
if __name__ == "__main__":
main()
Let’s write an end-to-end test that runs the program in a subprocess and
checks that it completes with non-empty output. End-to-end tests run the
entire program the way an end-user would. Example 6-2 shows how you
might do this. For now, you can place its code in a file
test_random_wikipedia_article.py next to the module.
Example 6-2. A test for random_wikipedia_article
def test_output():
process = subprocess.run(
[sys.executable, "-m", "random_wikipedia_article"],
capture_output=True,
check=True,
)
assert process.stdout
TIP
By convention, tests are functions whose names start with test. Use the built-in assert
statement to check for expected behavior, such as the program output not being empty.
You can run the test by invoking pipx run pytest from the directory
containing both modules. However, this won’t work if your project has any
third-party dependencies. Tests must be able to import your project and its
dependencies, so you need to install pytest and your project in the same
environment.
If you use Poetry to manage your project, add pytest to its dependencies
using poetry add:
test_random_wikipedia_article.py . [100%]
$ py -m pytest
Once your test suite consists of more than a single module, keep it under a
tests directory. For larger projects, consider turning your test suite into a
Python package that mirrors the layout of the package under test. This lets
you have test modules in different sub-packages with the same name, and it
gives you the option to import helper modules such as common test utilities.
def fetch(url):
with urllib.request.urlopen(url) as response:
data = json.load(response)
return Article(data["title"], data["extract"])
def main():
article = fetch(API_URL)
show(article, sys.stdout)
The refactoring extracts fetch and show functions from main. It also
defines an Article class as the common denominator of these functions.
Let’s see how these changes allow you to test the parts of the program in
isolation and in a repeatable way.
For the fetch function, tests can set up a local HTTP server and perform a
roundtrip check, as shown in Example 6-4: You serve an Article instance
via HTTP, fetch the article from the server, and check that the served and
fetched instances are equal.1
Example 6-4. Testing the fetch function
def test_fetch():
article = Article("Lorem Ipsum", "Lorem ipsum dolor sit amet.")
with serve(article) as url:
assert article == fetch(url)
The show function accepts any file-like object. While main passes
sys.stdout, tests can pass an io.StringIO instance to store the output in
memory. Example 6-5 uses this technique to check that the output ends with
a newline. The final newline ensures the output doesn’t run into the next
shell prompt.
Example 6-5. Testing the show function
def test_final_newline():
article = Article("Lorem Ipsum", "Lorem ipsum dolor sit amet.")
file = io.StringIO()
show(article, file)
assert file.getvalue().endswith("\n")
Here are some other properties of the show function that you might check
for in your tests:
def test_final_newline(self):
article = Article("lorem", "ipsum dolor")
show(article, self.file)
self.assertEqual("\n", self.file.getvalue()[-1])
def test_all_words(self):
article = Article("lorem ipsum", "dolor sit amet")
show(article, self.file)
for word in ("lorem", "ipsum", "dolor", "sit", "amet"):
self.assertIn(word, self.file.getvalue())
As before, tests are functions whose names start with test, but the
functions are enclosed in a test class that derives from
unittest.TestCase. The test class has several responsibilities:
It allows the framework to run each test.
It allows tests to check for expected properties, using assert*
methods.
It allows you to prepare a test environment for each test, using the
setUp method.
In Example 6-6, the setUp method initializes an output buffer for the show
function to write to. Sometimes you need to clean up the test environment
after each test; for this purpose, you can define a corresponding tearDown
method.
Run the test suite using the command py -m unittest from the project
directory.
$ py -m unittest
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s
OK
Thanks to the unittest library, you can test Python modules without
taking on a third-party dependency. If you’re already familiar with an JUnit-
style framework from another language, you’ll feel right at home. But there
are also some problems with this design. The framework forces you to place
tests in a class inheriting from unittest.TestCase. That boilerplate hurts
readability compared to a module with simple test functions. The class-
based design also leads to strong coupling between test functions, the test
environment, and the framework itself. Finally, every assertion method (like
assertEqual and assertIn in Example 6-6) constitutes a unique little
snowflake, which betrays a lack of expressivity and generality.
The pytest Framework
These days, the third-party framework pytest has become somewhat of a
de-facto standard in the Python world. Tests written with pytest are simple
and readable—you write most tests as if there was no framework, using
basic language primitives like functions and assertions. At the same time,
the framework is powerful and expressive, as you’ll see shortly. Finally,
pytest is extensible and comes with a rich ecosystem of plugins.
Example 6-4 and Example 6-5 show what tests look like when written using
pytest: they are simple functions whose names start with test. Checks are
just generic assertions—pytest rewrites the language construct to provide
rich error reporting in case of a test failure.
TIP
If you have a test suite written with unittest, there’s no need to rewrite it to start using
pytest—pytest “speaks” unittest, too. Use pytest as a test runner right away and
rewrite your test suite incrementally later.
Every test for the show function starts by setting up an output buffer. You
can use a fixture to remove this code duplication. Fixtures are simple
functions declared with the pytest.fixture decorator:
@pytest.fixture
def file():
return io.StringIO()
Tests (and fixtures) can use a fixture by including a function parameter with
the same name. When pytest invokes the test function, it passes the return
value of the fixture function. For example, here’s Example 6-5 rewritten to
use the fixture:
def test_final_newline(file):
article = Article("lorem", "ipsum dolor")
show(article, file)
assert file.getvalue().endswith("\n")
The file fixture isn’t coupled to any specific test, so it can be reused freely
across the test suite. That sets it apart from the approach used in Example 6-
6, where the test environment is only accessible to test methods defined in
the same class or class hierarchy.
And there’s another difference: test methods in a unittest.TestCase
share a single test environment; by contrast, test functions in pytest can use
any number of fixtures. For example, you could extract the test article into
an article fixture.
If every test used the same article, you’d likely miss some edge cases,
though—you don’t want your program to crash if an article comes with an
empty summary. Example 6-7 shows how you can run a test against a
number of articles.
Example 6-7. Running tests against multiple articles
articles = [
Article(),
Article("test"),
Article("test", "lorem ipsum dolor"),
Article(
"Lorem ipsum dolor sit amet, consectetur adipiscing elit",
"Nulla mattis volutpat sapien, at dapibus ipsum accumsan eu."
),
]
@pytest.mark.parametrize("article", articles)
def test_final_newline(article, file):
show(article, file)
assert file.getvalue().endswith("\n")
So what did you gain here? For one thing, you don’t need to decorate each
test with pytest.mark.parametrize. And there’s another advantage if
your tests aren’t all in the same module: You can place any fixture in a file
named conftest.py and use it across your entire test suite without importing
it.
The syntax for parametrized fixtures is somewhat arcane, though. To keep
things simple, I like to define a small helper for it:
def parametrized_fixture(*params):
return pytest.fixture(params=params)(lambda request: request.param)
Fixtures can get large and expensive. The remainder of this section
introduces some techniques that are useful when that happens:
A session-scoped fixture is created only once per test run.
Fixtures can be generators, allowing you to clean up resources after
use.
Fixtures can depend on other fixtures, making your test code more
modular.
A factory fixture returns a function for creating test objects, instead of
the test object itself.
Example 6-4 showed a test that fetches an article from a local server. Its
serve helper function takes an article and returns a URL for fetching the
article. More precisely, it returns the URL wrapped in a context manager, an
object for use in a with block. This allows serve to clean up after itself
when you exit the with block—say, shut down the server.3
Clearly, firing up and shutting down a web server for every test is quite
expensive. Would it help to turn the server into a fixture? At first glance, not
much—every test gets its own instance of a fixture. However, you can
instruct pytest to create a fixture only once during the entire test session:
@pytest.fixture(scope="session")
def httpserver():
...
That looks more promising, but how do you shut down the server when the
tests are done with it? Up to now, your fixtures only needed to prepare a test
object and return it. You can’t run code after a return statement. You can
run code after a yield statement, however—and so pytest allows you to
define a fixture as a generator. Example 6-9 shows the resulting
httpserver fixture.
Example 6-9. The httpserver fixture
@pytest.fixture(scope="session")
def httpserver():
with http.server.HTTPServer(("localhost", 0), ...) as server:
thread = threading.Thread(target=server.serve_forever, daemon=True)
thread.start()
yield server
server.shutdown()
thread.join()
I’ve omitted the actual request handling for brevity—let’s assume the server
response returns server.article in its body.4 There’s still a missing piece,
though: How would you define the serve function, now that it depends on
a fixture to do its work?
You can access a fixture from a test and from another fixture. Defining
serve inside test_fetch does little to simplify the test. So let’s define
serve inside its own fixture—after all, fixtures can return any object,
including functions. Example 6-10 shows what that looks like in practice.
Example 6-10. The serve fixture
@pytest.fixture
def serve(httpserver):
def f(article):
httpserver.article = article
return f"http://localhost:{httpserver.server_port}"
return f
The outer function defines a serve fixture, which depends on
httpserver.
The inner function is the serve function you call in your tests.
The serve function no longer returns a context manager, just a plain URL
—the httpserver fixture handles all of the setting up and tearing down. As
a result, you can simplify the test case quite a bit (Example 6-11). It’s truly
a roundtrip test now!
Example 6-11. Testing the fetch function using fixtures
def test_fetch(article, serve):
assert article == fetch(serve(article))
Next, remove the existing httpserver fixture; the plugin provides a fixture
with the same name. Finally, modify the serve fixture as shown in
Example 6-12.
Example 6-12. The serve fixture using pytest-httpserver
@pytest.fixture
def serve(httpserver):
def f(article):
handler = httpserver.expect_request("/")
handler.respond_with_json(
{"title": article.title, "extract": article.summary}
)
return httpserver.url_for("/")
return f
As you can see, plugins can save you much work in testing your code. In
this case, you’ve been able to test the fetch function without implementing
your own test server.
Summary
In this chapter, you’ve learned how to use pytest to test your Python
projects.
In pytest, tests are functions that exercise your code and check for expected
behavior using the assert builtin. Prefix their name—and the name of the
containing modules—with test, and pytest will discover them
automatically. Fixtures are reusable functions or generators that set up and
tear down test objects. Use them in a test by including a parameter named
like the fixture. Plugins for pytest can provide useful fixtures, as well as
modify test execution, enhance reporting, and much more.
It’s a prime characteristic of good software that it’s easy to change. Any
piece of code used in the real world must adapt to evolving requirements
and an ever-changing environment. Tests promote ease of change in several
ways:
1 Don’t worry about the serve function for now—you’ll implement it later.
2 You can pass the file-like object to Console using its file parameter.
3 You can implement the function yourself using http.server from the standard library, as
well as the threading module and the @contextmanager decorator from contextlib.
4 If you’d like to try this yourself, derive a class from
http.server.BaseHTTPRequestHandler and pass it to HTTPServer. Its do_GET method
needs to write a response with server.article in JSON format.
5 There’s also the related question of specificity, the probability of tests passing if the code is
free of defects. One example for low specificity is flakiness—tests that fail intermittently due
to external factors, like connectivity, timing, or system load. Another example are tests that
break when you change implementation details, even though the software behaved as expected.
Chapter 7. Measuring Coverage
with Coverage.py
Coverage tools record each executed statement while you run your code.
After completion, they report the overall percentage of executed statements
with respect to the entire codebase. You can use this measurement as a
reasonable proxy for the completeness of your test suite, an upper bound: If
your tests cover 100% of your code, you may not be guaranteed that the
code is free of bugs. However, if the tests cover any less than that, they will
definitely not detect any bugs in the parts without test coverage. Therefore,
you should check that all code changes come with adequate test coverage.
How does all of this work in Python? The interpreter allows you to register
a callback—a trace function—using the function sys.settrace from the
standard library. From that point onwards, the interpreter invokes the
callback whenever it executes a line of code—as well as in some other
situations, like entering or returning from functions or raising exceptions.
Coverage tools register a trace function that records each executed line of
source code in a local database.
By the way, coverage tools are not limited to measuring test coverage, even
though it’s their primary purpose. They can also help you determine which
modules in a large codebase are used by each endpoint of an API. Or you
could use them to find out how much of your project is documented in code
examples. What they do is simple: they record each line in your source code
when you run it, be that from test cases or in another way.
If you’re curious why the trace module reports less than 100% coverage,
take a look at the generated *.cover files, next to the respective source files.
They contain the source code annotated with execution counts; missing
lines are marked by the string >>>>>>. But don’t worry about those missing
lines too much now, you’ll find out all about them in a bit.
As you can see, measuring coverage using the standard library alone is
quite cumbersome, even for a simple use case as this. While the trace
module is interesting as an early proof of concept, I don’t recommend using
it for any real world project. Instead, you should use the third-party package
coverage.
Using Coverage.py
Coverage.py is a mature, widely use, and feature-complete code coverage
tool for Python. Add coverage to your test dependencies as shown below.2
The toml extra allows coverage to read its configuration from
pyproject.toml on Python versions without TOML support in the standard
library (before 3.11).
This command creates a file .coverage in the current directory. Under the
hood, this file is just a SQLite database, so feel free to poke around if you
have the sqlite3 tool ready on your system.
The coverage report includes the overall percentage of code coverage and a
break-down per source file. If you specify the --show-missing option, the
report also lists the individual statements that are missing from coverage,
identified by line number.
TIP
Measuring code coverage for your test suite may seem strange—but you should always
do it. It ensures that you notice when tests aren’t run by mistake, and it can help you
identify unreachable code in the tests. This boils down to a more general piece of
advice: Treat your tests the same way you would treat any other code.
[tool.coverage.report]
show_missing = true
If your project consists of more than a single Python module, you should
also specify your top-level import package in the configuration. This allows
coverage to report modules even if they’re missing from coverage entirely,
rather than just those that showed up during execution. If your tests are
organized in a tests package with multiple test modules, list that package
as well:
[tool.coverage.run]
source = ["random_wikipedia_article", "tests"]
try: # 7
from importlib.metadata import metadata # 8
except ImportError: # 9 (missing)
from importlib_metadata import metadata # 10 (missing)
Here, the coverage report tells you that you never tested the program with
the importlib_metadata backport. This isn’t a shortcoming of your test
suite, but it is a shortcoming of how you’re running it. You need to test your
program on all supported Python versions, including Python 3.7, which
doesn’t have importlib.metadata in the standard library.
Let’s run the tests on Python 3.7. If you added python-httpserver in
“Extending pytest with Plugins”, you’ll need to revert back to the home-
grown httpserver fixture, as the plugin doesn’t support Python 3.7. Next,
switch to an environment with Python 3.7 and install the project:
[tool.coverage.run]
branch = true
Parallel Coverage
With a single coverage data file, it’s easy to erase data accidentally by
omitting the --append option. You could configure coverage run to
append by default, but that’s error-prone, too: If you forget to run coverage
erase periodically, you end up with stale data in your report.
There’s a better way to gather coverage across multiple environments. The
coverage tool allows you to record coverage data in separate files for each
run. The option for enabling this behavior is named --parallel. (The
option name is somewhat misleading; it has nothing to do with parallel
execution.) If your tests run on more than a single Python version—or in
more than a single process, as I’ll explain below—it’s a good idea to enable
parallel mode by default in pyproject.toml:
[tool.coverage.run]
parallel = true
Even in parallel mode, coverage reports are based on a single data file.
Before reporting, you’ll therefore need to merge the data files using the
command coverage combine. That changes the two-step process from
above into a three-step one: coverage run — coverage combine —
coverage report.
Let’s put all of this together then. First, you run the test suite on each
supported Python version. For brevity, I’m only showing Python 3.7 and
3.11 here. I’m also omitting poetry install since you’ve already
installed the project into those environments.
At this point, you’ll have multiple data files in your current directory. Their
names start with .coverage, followed by the machine name, process ID, and
a random number. The command coverage combine aggregates those files
into a single .coverage file. By default, it also removes the individual files.
If you run coverage report again, you’ll notice that lines 9 and 10 are no
longer missing:
Measuring in Subprocesses
The remaining missing lines in the coverage report correspond to the body
of the main function, and its invocation at the end of the file. This is
surprising—the end-to-end test from Example 6-2 runs the entire program,
so all of those lines are definitely being tested.
If you think about how coverage measurement works, you can maybe guess
what’s going on here. The end-to-end test runs the program in a separate
process, on a separate instance of the Python interpreter. In that process,
coverage never had the chance to register its trace function, so none of
those executed lines were recorded anywhere. Fortunately, the coverage
tool provides a public API to enable tracing for the current process: the
coverage.process_start function.
You could invoke this function from inside random_wikipedia_article,
but it would be better if you didn’t have to modify your application to
support code coverage. As it turns out, a somewhat obscure feature of
Python environments allows you to invoke the function during interpreter
startup (see [Link to Come]). The interpreter executes lines in a .pth file in
the site directory, as long as they start with an import statement. This
means that you can activate coverage by installing a coverage.pth file into
the environment, with the following contents:
$ export COVERAGE_PROCESS_START=pyproject.toml
If you re-run the test suite, the coverage report should now consider the
program to have full coverage:
By the way, this only worked because you enabled parallel coverage earlier.
Without it, the main process would overwrite the coverage data from the
subprocess, since both would be using the same data file.
At this point, you probably think that this is way too much work for all but
the largest projects. If you had to take these steps manually each time, I’d
agree. Bear with me though until Chapter 7, where I’ll explain how to
automate testing and coverage reporting with Nox. Automation can give
you the full benefit of strict checks at minimal cost.3
if rare_condition:
print("got rare condition") # pragma: no cover
When you decide to exclude code from coverage, base your decision on the
cost-benefit ratio of writing a test, not merely on how cumbersome testing
would be. When you start working with a new library or interfacing with a
new system, it can be hard to figure out how to test your code. Time and
again, I’ve found it paid off to write the difficult test, or to refactor to make
testing easier. Time and again, those tests ended up detecting bugs that
would likely have gone unnoticed and caused problems in production.
Legacy projects often consist of a large codebase with minimal test
coverage. As a general rule, coverage in such projects should monotonically
increase—changes shouldn’t lead to a drop in coverage. You’ll often find
yourself in a dilemma here: To test, you need to refactor the code; but
refactoring is too risky without tests. Your first step should be to find the
minimal, safe refactoring to increase testability. Often, this will consist of
breaking a dependency of the code under test. For example, if you need to
test a function that, among many things, connects to the production
database, consider adding an optional parameter so you can pass the
connection from the outside when testing.
Summary
You can measure the extend to which the test suite exercises your project
using the coverage tool. This is particularly useful to discover edge cases
for which you don’t have a test. Branch coverage captures the control flow
of your program, instead of just isolated lines of source code. Parallel
coverage allows you to measure coverage across multiple environments;
you’ll need to combine the data files before reporting. Measuring coverage
in subprocesses requires setting up a .pth file and an environment variable.
Measuring test coverage effectively for a project requires some amount of
configuration, as well as the right tool incantations. In the next chapter,
you’ll see how you can automate these steps with Nox. You’ll set up checks
that give you confidence in your changes, while staying out of your way
most of the time.
2 Don’t forget to quote the square brackets if you’re a zsh user, as they’re special characters in
that shell.
3 The widely used pytest plugin pytest-cov aims to run coverage in the right way behind the
scenes. After installing the plugin, run pytest with the --cov option to enable the plugin. You
can still configure coverage in pyproject.toml. Subprocess coverage works out of the box, as
well as some other forms of parallel execution. On the other hand, while trying to be helpful,
the plugin also adds a layer of indirection. You may find that running coverage directly gives
you more fine-grained control and a better understanding of what’s going on under the covers.
About the Author
Claudio Jolowicz is a software engineer with 15 years of industry
experience in C++ and Python, and an open-source maintainer active in the
Python community. He is the author of the Hypermodern Python blog and
project template, and co-maintainer of Nox, a Python tool for test
automation. In former lives, Claudio has worked as a lawyer and as a full-
time musician touring from Scandinavia to West Africa. Get in touch with
him on Twitter: @cjolowicz