Serious Python - 2019 PDF
Serious Python - 2019 PDF
WRITE LESS.
CODE MORE. COVERS
BUILD BETTER PYTHON 2 AND 3
PROGRAMS.
PY THON
SERIOUS PY THON
Sharpen your Python skills as you dive deep into the • Employ Python for functional programming using
Python programming language with Serious Python. generators, pure functions, and functional functions B L A C K - B E L T A D V I C E O N D E P L O Y M E N T,
Written for developers and experienced programmers, S C A L A B I L I T Y, T E S T I N G , A N D M O R E
• Extend flake8 to work with the abstract syntax tree
Serious Python brings together more than 15 years of
(AST) to introduce more sophisticated automatic
Python experience to teach you how to avoid common
checks
mistakes, write code more efficiently, and build better
programs in less time. You’ll cover a range of advanced • Apply dynamic performance analysis to identify JULIEN DANJOU
topics like multithreading and memoization, get advice bottlenecks in your code
from experts on things like designing APIs and dealing
• Work with relational databases and effectively
with databases, and learn Python internals to give you a
manage and stream data with PostgreSQL
deeper understanding of the language itself.
Take your Python skills from good to great. Learn from
You’ll first learn how to start a project and tackle
the experts and get seriously good at Python with
topics like versioning, coding style, and automated
Serious Python!
checks. Then you’ll look at how to define functions
efficiently, pick the right data structures and libraries, ABOUT THE AUTHOR
build future-proof programs, package your software
Julien Danjou is a principal software engineer at Red
for distribution, and optimize your programs down to
Hat and a contributor to OpenStack, the largest existing
the bytecode. You’ll also learn how to:
open source project written in Python. He has been a
• Create and use effective decorators and methods, free software and open source hacker for the past
including abstract, static, and class methods 15 years.
T H E F I N E ST I N G E E K E N T E RTA I N M E N T ™
DANJOU
w w w.nostarch.com
$34.95 ($45.95 CDN)
PYTHON
PROGRAMMING LANGUAGES/
SHELVE IN:
Serious Python
Serious
Python
Black-Belt Advice on
Deployment, Scalability,
Testing, and More
b y Ju l i e n D a n j o u
San Francisco
Serious Python. Copyright © 2019 by Julien Danjou.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or by any information storage or retrieval
system, without the prior written permission of the copyright owner and the publisher.
ISBN-10: 1-59327-878-0
ISBN-13: 978-1-59327-878-6
For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly:
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other
product and company names mentioned herein may be the trademarks of their respective owners. Rather
than use a trademark symbol with every occurrence of a trademarked name, we are using the names only
in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the
trademark.
The information in this book is distributed on an “As Is” basis, without warranty. While every precaution
has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any
liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or
indirectly by the information contained in it.
About the Author
Julien Danjou has been a free software hacker for close to twenty years
and has been developing software with Python for twelve years. He cur-
rently works as Project Team Leader for the distributed cloud platform
OpenStack, which has the largest existing open-source Python codebase
at 2.5 million lines of Python. Before building clouds, Julien created the
awesome window manager and contributed to various software such as
Debian and GNU Emacs.
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 9: The Abstract Syntax Tree, Hy, and Lisp-like Attributes . . . . . . . . . . . . . . . . 135
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Conte nt s in De ta il
Acknowledgments xv
Introduction 1
Who Should Read This Book and Why . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1
Starting Your Project 5
Versions of Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Laying Out Your Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
What to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
What Not to Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Version Numbering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Coding Style and Automated Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Tools to Catch Style Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Tools to Catch Coding Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Joshua Harlow on Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2
Modules, Libraries, and Frameworks 15
The Import System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
The sys Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Import Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Custom Importers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Meta Path Finders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Useful Standard Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
External Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
The External Libraries Safety Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Protecting Your Code with an API Wrapper . . . . . . . . . . . . . . . . . . . . . . . . 23
Package Installation: Getting More from pip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Using and Choosing Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Doug Hellmann, Python Core Developer, on Python Libraries . . . . . . . . . . . . . . . . . . . . 27
3
Documentation and Good API Practice 33
Documenting with Sphinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Getting Started with Sphinx and reST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Sphinx Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Writing a Sphinx Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Managing Changes to Your APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Numbering API Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Documenting Your API Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Marking Deprecated Functions with the warnings Module . . . . . . . . . . . . . . . 43
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Christophe de Vienne on Developing APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4
Handling Timestamps and Time Zones 49
The Problem of Missing Time Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Building Default datetime Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Time Zone–Aware Timestamps with dateutil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Serializing Time Zone–Aware datetime Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Solving Ambiguous Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5
Distributing Your Software 57
A Bit of setup.py History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Packaging with setup.cfg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
The Wheel Format Distribution Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Sharing Your Work with the World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Visualizing Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Using Console Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Using Plugins and Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Nick Coghlan on Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6
Unit Testing 75
The Basics of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Some Simple Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Skipping Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Running Particular Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Running Tests in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Creating Objects Used in Tests with Fixtures . . . . . . . . . . . . . . . . . . . . . . . . 81
Running Test Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Controlled Tests Using Mocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Revealing Untested Code with coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Setting Up a Virtual Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Using virtualenv with tox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Re-creating an Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Using Different Python Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Integrating Other Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Testing Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Robert Collins on Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
xii Contents in Detail
7
Methods and Decorators 99
Decorators and When to Use Them . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Creating Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Writing Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Stacking Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Writing Class Decorators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
How Methods Work in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Static Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Class Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Abstract Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Mixing Static, Class, and Abstract Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Putting Implementations in Abstract Methods . . . . . . . . . . . . . . . . . . . . . . . 114
The Truth About super . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8
Functional Programming 119
Creating Pure Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Creating a Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Returning and Passing Values with yield . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Inspecting Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
List Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Functional Functions Functioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Applying Functions to Items with map() . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Filtering Lists with filter() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Getting Indexes with enumerate() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Sorting a List with sorted() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Finding Items That Satisfy Conditions with any() and all() . . . . . . . . . . . . . . 128
Combining Lists with zip() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
A Common Problem Solved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Useful itertools Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9
The Abstract Syntax Tree, Hy, and Lisp-like Attributes 135
Looking at the AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Writing a Program Using the AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
The AST Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Walking Through an AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Extending flake8 with AST Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Writing the Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Ignoring Irrelevant Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Checking for the Correct Decorator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Looking for self . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A Quick Introduction to Hy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Paul Tagliamonte on the AST and Hy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Contents in Detail xiii
10
Performances and Optimizations 151
Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Understanding Behavior Through Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
cProfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Disassembling with the dis Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Defining Functions Efficiently . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Ordered Lists and bisect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
namedtuple and Slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Memoization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Faster Python with PyPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Achieving Zero Copy with the Buffer Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Victor Stinner on Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
11
Scaling and Architecture 177
Multithreading in Python and Its Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Multiprocessing vs. Multithreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Event-Driven Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Other Options and asyncio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Service-Oriented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Interprocess Communication with ZeroMQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
12
Managing Relational Databases 187
RDBMSs, ORMs, and When to Use Them . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Database Backends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Streaming Data with Flask and PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Writing the Data-Streaming Application . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Building the Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Dimitri Fontaine on Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
13
Write Less, Code More 201
Using six for Python 2 and 3 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Strings and Unicode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Handling Python Modules Moves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
The modernize Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Using Python Like Lisp to Make a Single Dispatcher . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Creating Generic Methods in Lisp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Generic Methods with Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Context Managers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Less Boilerplate with attr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Index 215
Acknowledgments
Writing this first book has been a tremendous effort. Looking back, I had
no clue how crazy this journey would be but also no idea how fulfilling it
would turn out to be.
They say that if you want to go fast you should go alone, but that if
you want to go far you should go together. This is the fourth edition of
the original book I wrote, and I would not have made it here without the
people who helped along the way. This is a team effort and I would like to
thank everyone who participated.
Most of the interviewees gave me their time and trust without a
second thought, and I owe a lot of what we teach in this book to them:
Doug Hellmann for his great advice about building libraries, Joshua
Harlow for his good humor and knowledge about distributed systems,
Christophe de Vienne for his experience in building frameworks, Victor
Stinner for his incredible CPython knowledge, Dimitri Fontaine for his
database wisdom, Robert Collins for messing up with testing, Nick Coghlan
for his work in getting Python into better shape, and Paul Tagliamonte for
his amazing hacker spirit.
Thanks to the No Starch crew for working with me on bringing this
book to a brand new level — especially to Liz Chadwick for her editing
skills, Laurel Chun for keeping me on track, and Mike Driscoll for his
technical insight.
My gratitude also goes to the free software communities who shared
their knowledge and helped me grow, especially to the Python community
which always has been welcoming and enthusiastic.
xvi Acknowledgments
Introduction
2 Introduction
using the sys module, getting more from the pip package manager, choos-
ing the best framework for you, and using standard and external libraries.
There’s also an interview with Doug Hellmann.
Chapter 3 gives advice on documenting your projects and managing
your APIs as your project evolves even after publication. You’ll get specific
guidance on using Sphinx to automate certain documentation tasks. Here
you’ll find an interview with Christophe de Vienne.
Chapter 4 covers the age-old issue of time zones and how best to handle
them in your programs using datetime objects and tzinfo objects.
Chapter 5 helps you get your software to users with guidance on distri-
bution. You’ll learn about packaging, distributions standards, the distutils
and setuptools libraries, and how to easily discover dynamic features in a
package using entry points. Nick Coghlan is interviewed.
Chapter 6 advises you on unit testing with best-practice tips and specific
tutorials on automating unit tests with pytest. You’ll also look at using virtual
environments to increase the isolation of your tests. The interview is with
Robert Collins.
Chapter 7 digs into methods and decorators. This is a look at using
Python for functional programming, with advice on how and when to use
decorators and how to create decorators for decorators. We’ll also dig into
static, class, and abstract methods and how to mix the three for a more
robust program.
Chapter 8 shows you more functional programming tricks you can
implement in Python. This chapter discusses generators, list comprehen-
sions, functional functions and common tools for implementing them, and
the useful functools library.
Chapter 9 peeks under the hood of the language itself and discusses
the abstract syntax tree (AST) that is the inner structure of Python. We’ll
also look at extending flake8 to work with the AST to introduce more
sophisticated automatic checks into your programs. The chapter concludes
with an interview with Paul Tagliamonte.
Chapter 10 is a guide to optimizing performance by using appropriate
data structures, defining functions efficiently, and applying dynamic per-
formance analysis to identify bottlenecks in your code. We’ll also touch on
memoization and reducing waste in data copies. You’ll find an interview with
Victor Stinner.
Chapter 11 tackles the difficult subject of multithreading, including
how and when to use multithreading as opposed to multiprocessing and
whether to use event-oriented or service-oriented architecture to create
scalable programs.
Chapter 12 covers relational databases. We’ll take a look at how they
work and how to use PostgreSQL to effectively manage and stream data.
Dimitri Fontaine is interviewed.
Finally, Chapter 13 offers sound advice on a range of topics: making
your code compatible with both Python 2 and 3, creating functional Lisp-
like code, using context managers, and reducing repetition with the attr
library.
Introduction 3
1
S ta r t ing Your Projec t
Versions of Python
Before beginning a project, you’ll need to decide what version(s) of Python
it will support. This is not as simple a decision as it may seem.
It’s no secret that Python supports several versions at the same time.
Each minor version of the interpreter gets bug-fix support for 18 months
and security support for 5 years. For example, Python 3.7, released on
June 27, 2018, will be supported until Python 3.8 is released, which
should be around October 2019. Around December 2019, a last bug-fix
release of Python 3.7 will occur, and everyone will be expected to switch to
Python 3.8. Each new version of Python introduces new features and depre-
cates old ones. Figure 1-1 illustrates this timeline.
a1
a2
2
ta
ta
ph
ph
be
be
al
al
8
8
3.
3.
3.
3.
3.
Py .7.w
1
z
th x
Py .y
on
on
on
n
7.
Py .7.
7.
7
7
ho
ho
3.
3.
3.
3.
th
th
3
t
Py
Py
n
n
ho
ho
ho
ho
ho
ho
t
t
Py
Py
Py
Py
Py
Py
18
19
19
19
19
20
20
20
20
20
n
ay
ct
c
De
Ju
Ja
O
M
Time
• Versions 2.6 and older are now obsolete, so I do not recommend you
worry about supporting them at all. If you do intend to support these
older versions for whatever reason, be warned that you’ll have a hard
time ensuring that your program supports Python 3.x as well. Having
said that, you might still run into Python 2.6 on some older systems—if
that’s the case, sorry!
• Version 2.7 is and will remain the last version of Python 2.x. Every sys-
tem is basically running or able to run Python 3 one way or the other
nowadays, so unless you’re doing archeology, you shouldn’t need to
worry about supporting Python 2.7 in new programs. Python 2.7 will
cease to be supported after the year 2020, so the last thing you want
to do is build a new software based on it.
• Version 3.7 is the most recent version of the Python 3 branch as of this
writing, and that’s the one that you should target. However, if your
operating system ships version 3.6 (most operating systems, except
Windows, ship with 3.6 or later), make sure your application will also
work with 3.6.
Techniques for writing programs that support both Python 2.7 and 3.x
will be discussed in Chapter 13.
Finally, note that this book has been written with Python 3 in mind.
6 Chapter 1
Laying Out Your Project
Starting a new project is always a bit of a puzzle. You can’t be sure how your
project will be structured, so you might not know how to organize your files.
However, once you have a proper understanding of best practices, you’ll
understand which basic structure to start with. Here I’ll give some tips on
dos and don’ts for laying out your project.
What to Do
First, consider your project structure, which should be fairly simple. Use
packages and hierarchy wisely: a deep hierarchy can be a nightmare to
navigate, while a flat hierarchy tends to become bloated.
Then, avoid making the common mistake of storing unit tests out-
side the package directory. These tests should definitely be included in a
subpackage of your software so that they aren’t automatically installed as
a tests top-level module by setuptools (or some other packaging library)
by accident. By placing them in a subpackage, you ensure they can be
installed and eventually used by other packages so users can build their
own unit tests.
Figure 1-2 illustrates what a standard file hierarchy should look like.
conf.py
docs quickstart.rst
index.rst
foobar __init__.py
test_cli.py
data image.png
What Not to Do
There is a particular design issue that I often encounter in project struc-
tures that have not been fully thought out: some developers will create
files or modules based on the type of code they will store. For example,
they might create functions.py or exceptions.py files. This is a terrible approach
and doesn’t help any developer when navigating the code. When reading a
codebase, the developer expects a functional area of a program to be con-
fined in a particular file. The code organization doesn’t benefit from this
approach, which forces readers to jump between files for no good reason.
Organize your code based on features, not on types.
It is also a bad idea to create a module directory that contains only an
__init__.py file, because it’s unnecessary nesting. For example, you shouldn’t
create a directory named hooks with a single file named hooks/__init__.py
in it, where hooks.py would have been enough. If you create a directory, it
should contain several other Python files that belong to the category the
directory represents. Building a deep hierarchy unnecessarily is confusing.
You should also be very careful about the code that you put in the
__init__.py file. This file will be called and executed the first time that a
module contained in the directory is loaded. Placing the wrong things in
your __init__.py can have unwanted side effects. In fact, __init__.py files
should be empty most of the time, unless you know what you’re doing.
Don’t try to remove __init__.py files altogether though, or you won’t be
able to import your Python module at all: Python requires an __init__.py
file to be present for the directory to be considered a submodule.
Version Numbering
Software versions need to be stamped so users know which is the more
recent version. For every project, users must be able to organize the time-
line of the evolving code.
There is an infinite number of ways to organize your version numbers.
However, PEP 440 introduces a version format that every Python package,
and ideally every application, should follow so that other programs and
packages can easily and reliably identify which versions of your package
they require.
8 Chapter 1
PEP 440 defines the following regular expression format for version
numbering:
N[.N]+[{a|b|c|rc}N][.postN][.devN]
This allows for standard numbering such as 1.2 or 1.2.3. There are a
few further details to note:
NOTE You might have heard of Semantic Versioning, which provides its own guidelines
for version numbering. This specification partially overlaps with PEP 440, but unfor-
tunately, they’re not entirely compatible. For example, Semantic Versioning’s recom-
mendation for prerelease versioning uses a scheme such as 1.0.0-alpha+001 that is
not compliant with PEP 440.
These guidelines really aren’t hard to follow, and they make a lot of
sense. Most Python programmers have no trouble sticking to them as they
write code.
However, errare humanum est, and it’s still a pain to look through your
code to make sure it fits the PEP 8 guidelines. Luckily, there’s a pep8 tool
(found at https://pypi.org/project/pep8/) that can automatically check any
Python file you send its way. Install pep8 with pip, and then you can use it
on a file like so:
$ pep8 hello.py
hello.py:4:1: E302 expected 2 blank lines, found 1
10 Chapter 1
$ echo $?
1
Here I use pep8 on my file hello.py, and the output indicates which lines
and columns do not conform to PEP 8 and reports each issue with a code—
here it’s line 4 and column 1. Violations of MUST statements in the speci-
fication are reported as errors, and their error codes start with an E. Minor
issues are reported as warnings, and their error codes start with a W. The
three-digit code following that first letter indicates the exact kind of error
or warning.
The hundreds digit tells you the general category of an error code:
for example, errors starting with E2 indicate issues with whitespace, errors
starting with E3 indicate issues with blank lines, and warnings starting with
W6 indicate deprecated features being used. These codes are all listed in
the pep8 readthedocs documentation (https://pep8.readthedocs.io/).
This will ignore any code E3 errors inside my hello.py file. The --ignore
option allows you to effectively ignore parts of the PEP 8 specification that
you don’t want to follow. If you’re running pep8 on an existing codebase, it
also allows you to ignore certain kinds of problems so you can focus on fix-
ing issues one category at a time.
NOTE If you write C code for Python (e.g., modules), the PEP 7 standard describes the coding
style that you should follow.
These tools all make use of static analysis—that is, they parse the code
and analyze it rather than running it outright.
If you choose to use Pyflakes, note that it doesn’t check PEP 8 confor-
mance on its own, so you’d need the second pep8 tool to cover both.
To simplify things, Python has a project named flake8 (https://pypi.org/
project/flake8/) that combines pyflakes and pep8 into a single command. It
also adds some new fancy features: for example, it can skip checks on lines
containing # noqa and is extensible via plugins.
There are a large number of plugins available for flake8 that you can use
out of the box. For example, installing flake8-import-order (with pip install
flake8-import-order) will extend flake8 so that it also checks whether your
import statements are sorted alphabetically in your source code. Yes, some
projects want that.
In most open source projects, flake8 is heavily used for code style
verification. Some large open source projects have even written their own
plugins for flake8, adding checks for errors such as odd usage of except,
Python 2/3 portability issues, import style, dangerous string formatting,
possible localization issues, and more.
If you’re starting a new project, I strongly recommend that you use
one of these tools for automatic checking of your code quality and style.
If you already have a codebase that didn’t implement automatic code
checking, a good approach is to run your tool of choice with most of the
warnings disabled and fix issues one category at a time.
Though none of these tools may be a perfect fit for your project or your
preferences, flake8 is a good way to improve the quality of your code and
make it more durable.
NOTE Many text editors, including the famous GNU Emacs and vim, have plugins avail-
able (such as Flycheck) that can run tools such as pep8 or flake8 directly in your code
buffer, interactively highlighting any part of your code that isn’t PEP 8 compliant.
This is a handy way to fix most style errors as you write your code.
We’ll talk about extending this toolset in Chapter 9 with our own plugin
to verify correct method declaration.
12 Chapter 1
Joshua Harlow on Python
Joshua Harlow is a Python developer. He was one of the technical leads on
the OpenStack team at Yahoo! between 2012 and 2016 and now works at
GoDaddy. Josh is the author of several Python libraries such as Taskflow,
automaton, and Zake.
What got you into using Python?
I started programming in Python 2.3 or 2.4 back in about 2004 dur-
ing an internship at IBM near Poughkeepsie, New York (most of my
relatives and family are from upstate NY, shout out to them!). I forget
exactly what I was doing there, but it involved wxPython and some
Python code that they were working on to automate some system.
After that internship I returned to school, went on to graduate
school at the Rochester Institute of Technology, and ended up working
at Yahoo!.
I eventually ended up in the CTO team, where I and a few others
were tasked with figuring out which open source cloud platform to use.
We landed on OpenStack, which is written almost entirely in Python.
What do you love and hate about the Python language?
Some of the things I love (not a comprehensive listing):
1. Contributors to this project are always welcome. Feel free to jump on IRC and get involved
at irc://chat.freenode.net/openstack-state-management.
14 Chapter 1
2
Modul e s, L ibr a r ie s,
and Fr ame works
The import system is quite complex, and I’m assuming you already
know the basics, so here I’ll show you some of the internals of this system,
including how the sys module works, how to change or add import paths,
and how to use custom importers.
First, you need to know that the import keyword is actually a wrapper
around a function named __import__. Here is a familiar way of importing a
module:
16 Chapter 2
You can also imitate the as keyword of import, as these two equivalent
ways of importing show:
>>> it = __import__("itertools")
>>> it
<module 'itertools' from '/usr/.../>
Don’t forget that modules, once imported, are essentially objects whose
attributes (classes, functions, variables, and so on) are objects.
$ PYTHONPATH=/foo/bar python
>>> import sys
>>> '/foo/bar' in sys.path
True
It’s important to note that the list will be iterated over to find the
requested module, so the order of the paths in sys.path is important. It’s
useful to put the path most likely to contain the modules you are importing
early in the list to speed up search time. Doing so also ensures that if two
modules with the same name are available, the first match will be picked.
This last property is especially important because one common
mistake is to shadow Python built-in modules with your own. Your cur-
rent directory is searched before the Python Standard Library directory.
That means that if you decide to name one of your scripts random.py and
then try using import random, the file from your current directory will be
imported rather than the Python module.
Custom Importers
You can also extend the import mechanism using custom importers. This
is the technique that the Lisp-Python dialect Hy uses to teach Python how
to import files other than standard .py or .pyc files. (Hy is a Lisp implementa-
tion on top of Python, discussed later in the section “A Quick Introduction
to Hy” on page 145.)
The import hook mechanism, as this technique is called, is defined by
PEP 302. It allows you to extend the standard import mechanism, which in
turn allows you to modify how Python imports modules and build your own
system of import. For example, you could write an extension that imports
modules from a database over the network or that does some sanity check-
ing before importing any module.
18 Chapter 2
Python offers two different but related ways to broaden the import
system: the meta path finders for use with sys.meta_path and the path entry
finders for use with sys.path_hooks.
class MetaImporter(object):
def find_on_path(self, fullname):
fls = ["%s/__init__.hy", "%s.hy"]
dirpath = "/".join(fullname.split("."))
sys.meta_path.append(MetaImporter())
Once Python has determined that the path is valid and that it points to
a module, a MetaLoader object is returned, as shown in Listing 2-3.
class MetaLoader(object):
def __init__(self, path):
self.path = path
if not self.path:
return
sys.modules[fullname] = None
mod = import_file_to_module(fullname, self.path)
ispkg = self.is_package(fullname)
mod.__file__ = self.path
mod.__loader__ = self
mod.__name__ = fullname
if ispkg:
mod.__path__ = []
mod.__package__ = fullname
else:
mod.__package__ = fullname.rpartition('.')[0]
sys.modules[fullname] = mod
return mod
20 Chapter 2
• atexit allows you to register functions for your program to call when it
exits.
• argparse provides functions for parsing command line arguments.
• bisect provides bisection algorithms for sorting lists (see Chapter 10).
• calendar provides a number of date-related functions.
• codecs provides functions for encoding and decoding data.
• collections provides a variety of useful data structures.
• copy provides functions for copying data.
• csv provides functions for reading and writing CSV files.
• datetime provides classes for handling dates and times.
• fnmatch provides functions for matching Unix-style filename patterns.
• concurrent provides asynchronous computation (native in Python 3,
available for Python 2 via PyPI).
• glob provides functions for matching Unix-style path patterns.
• io provides functions for handling I/O streams. In Python 3, it also
contains StringIO (inside the module of the same name in Python 2),
which allows you to treat strings as files.
• json provides functions for reading and writing data in JSON format.
• logging provides access to Python’s own built-in logging functionality.
• multiprocessing allows you to run multiple subprocesses from your appli-
cation, while providing an API that makes them look like threads.
• operator provides functions implementing the basic Python operators,
which you can use instead of having to write your own lambda expres-
sions (see Chapter 10).
• os provides access to basic OS functions.
• random provides functions for generating pseudorandom numbers.
• re provides regular expression functionality.
• sched provides an event scheduler without using multithreading.
• select provides access to the select() and poll() functions for creating
event loops.
• shutil provides access to high-level file functions.
• signal provides functions for handling POSIX signals.
• tempfile provides functions for creating temporary files and directories.
• threading provides access to high-level threading functionality.
• urllib (and urllib2 and urlparse in Python 2.x) provides functions for
handling and parsing URLs.
• uuid allows you to generate Universally Unique Identifiers (UUIDs).
External Libraries
Python’s “batteries included” philosophy is that, once you have Python
installed, you should have everything you need to build whatever you
want. This is to prevent the programming equivalent of unwrapping
an awesome gift only to find out that whoever gave it to you forgot to
buy batteries for it.
Unfortunately, there’s no way the people behind Python can predict
everything you might want to make. And even if they could, most people
wouldn’t want to deal with a multigigabyte download, especially if they just
wanted to write a quick script for renaming files. So even with its extensive
functionality, the Python Standard Library doesn’t cover everything. Luckily,
members of the Python community have created external libraries.
The Python Standard Library is safe, well-charted territory: its modules
are heavily documented, and enough people use it on a regular basis that
you can feel assured it won’t break messily when you give it a try—and in
the unlikely event that it does break, you can be confident someone will fix
it in short order. External libraries, on the other hand, are the parts of the
map labeled “here there be dragons”: documentation may be sparse, func-
tionality may be buggy, and updates may be sporadic or even nonexistent.
Any serious project will likely need functionality that only external libraries
can provide, but you need to be mindful of the risks involved in using them.
Here’s a tale of external library dangers from the trenches. OpenStack
uses SQLAlchemy, a database toolkit for Python. If you’re familiar with
SQL, you know that database schemas can change over time, so OpenStack
also made use of sqlalchemy-migrate to handle schema migration needs. And
it worked . . . until it didn’t. Bugs started piling up, and nothing was get-
ting done about them. At this time, OpenStack was also interested in sup-
porting Python 3, but there was no sign that sqlalchemy-migrate was moving
toward Python 3 support. It was clear by that point that sqlalchemy-migrate
was effectively dead for our needs and we needed to switch to something
else—our needs had outlived the capabilities of the external library. At
the time of this writing, OpenStack projects are migrating toward using
Alembic instead, a new SQL database migrations tool with Python 3 sup-
port. This is happening not without some effort, but fortunately without
much pain.
22 Chapter 2
The External Libraries Safety Checklist
All of this builds up to one important question: how can you be sure
you won’t fall into this external libraries trap? Unfortunately, you can’t:
programmers are people, too, and there’s no way you can know for sure
whether a library that’s zealously maintained today will still be in good
shape in a few months. However, using such libraries may be worth the
risk; it’s just important to carefully assess your situation. At OpenStack,
we use the following checklist when choosing whether to use an external
library, and I encourage you to do the same.
Python 3 compatibility Even if you’re not targeting Python 3 right now,
odds are good that you will somewhere down the line, so it’s a good idea
to check that your chosen library is already Python 3–compatible and
committed to staying that way.
Active development GitHub and Ohloh usually provide enough infor-
mation to determine whether a given library is being actively developed
by its maintainers.
Active maintenance Even if a library is considered finished (that is,
feature complete), the maintainers should be ensuring it remains bug-
free. Check the project’s tracking system to see how quickly the main-
tainers respond to bugs.
Packaged with OS distributions If a library is packaged with major
Linux distributions, that means other projects are depending on it—so
if something goes wrong, you won’t be the only one complaining. It’s
also a good idea to check this if you plan to release your software to
the public: your code will be easier to distribute if its dependencies are
already installed on the end user’s machine.
API compatibility commitment Nothing’s worse than having your
software suddenly break because a library it depends on has changed
its entire API. You might want to check whether your chosen library has
had anything like this happen in the past.
License You need to make sure that the license is compatible with the
software you’re planning to write and that it allows you to do whatever
you intend to do with your code in terms of distribution, modification,
and execution.
Applying this checklist to dependencies is also a good idea, though that
could turn out to be a huge undertaking. As a compromise, if you know your
application is going to depend heavily on a particular library, you should
apply this checklist to each of that library’s dependencies.
24 Chapter 2
You can list the packages you already have installed using the pip freeze
command, like so:
$ pip freeze
Babel==1.3
Jinja2==2.7.1
commando=0.3.4
--snip--
--snip--
Proceed (y/n)? y
Successfully uninstalled pika-pool-0.1.3
One very valuable feature of pip is its ability to install a package with-
out copying the package’s file. The typical use case for this feature is when
you’re actively working on a package and want to avoid the long and boring
process of reinstalling it each time you need to test a change. This can be
achieved by using the -e <directory> flag:
$ pip install -e .
Obtaining file:///Users/jd/Source/daiquiri
Installing collected packages: daiquiri
Running setup.py develop for daiquiri
Successfully installed daiquiri
Here, pip does not copy the files from the local source directory but
places a special file, called an egg-link, in your distribution path. For example:
$ cat /usr/local/lib/python2.7/site-packages/daiquiri.egg-link
/Users/jd/Source/daiquiri
The egg-link file contains the path to add to sys.path to look for packages.
The result can be easily checked by running the following command:
Another useful pip tool is the -e option of pip install, helpful for
deploying code from repositories of various version control systems: git,
For the installation to work correctly, you need to provide the package
egg name by adding #egg= at the end of the URL. Then, pip just uses git
clone to clone the repository inside a src/<eggname> and creates an egg-link
file pointing to that same cloned directory.
This mechanism is extremely handy when depending on unreleased ver-
sions of libraries or when working in a continuous testing system. However,
since there is no versioning behind it, the -e option can also be very nasty.
You cannot know in advance that the next commit in this remote repository
is not going to break everything.
Finally, all other installation tools are being deprecated in favor of pip,
so you can confidently treat it as your one-stop shop for all your package
management needs.
26 Chapter 2
that replacing an external library after you’ve already written code that makes
use of it is a pain, but replacing a framework is a thousand times worse, usu-
ally requiring a complete rewrite of your program from the ground up.
To give an example, the Twisted framework mentioned earlier still
doesn’t have full Python 3 support: if you wrote a program using Twisted
a few years back and wanted to update it to run on Python 3, you’d be out
of luck. Either you’d have to rewrite your entire program to use a differ-
ent framework, or you’d have to wait until someone finally gets around to
upgrading Twisted with full Python 3 support.
Some frameworks are lighter than others. For example, Django has its
own built-in ORM functionality; Flask, on the other hand, has nothing of
the sort. The less a framework tries to do for you, the fewer problems you’ll
have with it in the future. However, each feature a framework lacks is another
problem for you to solve, either by writing your own code or going through
the hassle of handpicking another library to handle it. It’s your choice which
scenario you’d rather deal with, but choose wisely: migrating away from a
framework when things go sour can be a Herculean task, and even with all its
other features, there’s nothing in Python that can help you with that.
28 Chapter 2
methods of the API are required and which are optional. Abstract base
classes are built into some other OOP [object-oriented programming]
languages, but I’ve found a lot of Python programmers don’t know we
have them as well.
The binary search algorithm in the bisect module is a good example
of a useful feature that’s often implemented incorrectly, which makes it
a great fit for the Standard Library. I especially like the fact that it can
search sparse lists where the search value may not be included in the data.
There are some useful data structures in the collections module
that aren’t used as often as they could be. I like to use namedtuple for
creating small, class-like data structures that need to hold data with-
out any associated logic. It’s very easy to convert from a namedtuple to a
regular class if logic does need to be added later, since namedtuple sup-
ports accessing attributes by name. Another interesting data structure
from the module is ChainMap, which makes a good stackable namespace.
ChainMap can be used to create contexts for rendering templates or man-
aging configuration settings from different sources with clearly defined
precedence.
A lot of projects, including OpenStack and external libraries, roll their
own abstractions on top of the Standard Library, like for date/time
handling, for example. In your opinion, should programmers stick to
the Standard Library, roll their own functions, switch to some external
library, or start sending patches to Python?
All of the above! I prefer to avoid reinventing the wheel, so I advocate
strongly for contributing fixes and enhancements upstream to projects
that can be used as dependencies. On the other hand, sometimes it
makes sense to create another abstraction and maintain that code sepa-
rately, either within an application or as a new library.
The timeutils module, used in your example, is a fairly thin wrap-
per around Python’s datetime module. Most of the functions are short
and simple, but creating a module with the most common operations
ensures they’re handled consistently throughout all projects. Because
a lot of the functions are application specific, in the sense that they
enforce decisions about things like timestamp format strings or what
“now” means, they are not good candidates for patches to Python’s
library or to be released as a general purpose library and adopted by
other projects.
In contrast, I have been working to move the API services in
OpenStack away from the WSGI [Web Server Gateway Interface] frame-
work created in the early days of the project and onto a third-party web
development framework. There are a lot of options for creating WSGI
applications in Python, and while we may need to enhance one to make
it completely suitable for OpenStack’s API servers, contributing those
reusable changes upstream is preferable to maintaining a “private”
framework.
What are the best ways to branch code out from an application into a
library in terms of design, planning ahead, migration, etc.?
Applications are collections of “glue code” holding libraries together
for a specific purpose. Designing your application with the features to
achieve that purpose as a library first and then building the applica-
tion ensures that code is properly organized into logical units, which in
turn makes testing simpler. It also means the features of an application
are accessible through the library and can be remixed to create other
applications. If you don’t take this approach, you risk the features of
the application being tightly bound to the user interface, which makes
them harder to modify and reuse.
What advice would you give to people planning to design their own
Python libraries?
I always recommend designing libraries and APIs from the top down,
applying design criteria such as the Single Responsibility Principle
(SRP) at each layer. Think about what the caller will want to do with
the library and create an API that supports those features. Think about
what values can be stored in an instance and used by the methods ver-
sus what needs to be passed to each method every time. Finally, think
about the implementation and whether the underlying code should be
organized differently than the code of the public API.
SQLAlchemy is an excellent example of applying those guidelines.
The declarative ORM [object relational mapping], data mapping, and
expression generation layers are all separate. A developer can decide
the right level of abstraction for entering the API and using the library
based on their needs rather than constraints imposed by the library’s
design.
What are the most common programming errors you encounter while
reading Python developers’ code?
One area where Python’s idioms are significantly different from other
languages is in looping and iteration. For example, one of the most
common anti-patterns I see is the use of a for loop to filter a list by first
appending items to a new list and then processing the result in a second
loop (possibly after passing the list as an argument to a function). I
30 Chapter 2
almost always suggest converting filtering loops like these into genera-
tor expressions, which are more efficient and easier to understand. It’s
also common to see lists being combined so their contents can be pro-
cessed together in some way, rather than using itertools.chain().
There are other, more subtle things I often suggest in code reviews,
like using a dict() as a lookup table instead of a long if:then:else block,
making sure functions always return the same type of object (for exam-
ple, an empty list instead of None), reducing the number of arguments a
function requires by combining related values into an object with either
a tuple or a new class, and defining classes to use in public APIs instead
of relying on dictionaries.
What’s your take on frameworks?
Frameworks are like any other kind of tool. They can help, but you need
to take care when choosing one to make sure that it’s right for the job
at hand.
Pulling out the common parts of your app into a framework helps
you focus your development efforts on the unique aspects of an applica-
tion. Frameworks also provide a lot of bootstrapping code, for doing
things like running in development mode and writing a test suite, that
helps you bring an application to a useful state more quickly. They also
encourage consistency in the implementation of the application, which
means you end up with code that is easier to understand and more
reusable.
There are some potential pitfalls too, though. The decision to use
a particular framework usually implies something about the design
of the application itself. Selecting the wrong framework can make an
application harder to implement if those design constraints do not
align naturally with the application’s requirements. You may end up
fighting with the framework if you try to use patterns or idioms that
differ from what it recommends.
34 Chapter 3
You should also include a README.rst file that explains what your project
does. This README should be displayed on your GitHub or PyPI project
page; both sites know how to handle reST formatting.
Note If you’re using GitHub, you can also add a CONTRIBUTING.rst file that will
be displayed when someone submits a pull request. It should provide a checklist for
users to follow before they submit the request, including things like whether your
code follows PEP 8 and reminders to run the unit tests. Read the Docs (http://
readthedocs.org/) allows you to build and publish your documentation online
automatically. Signing up and configuring a project is straightforward. Then Read
the Docs searches for your Sphinx configuration file, builds your documentation, and
makes it available for your users to access. It’s a great companion to code-hosting sites.
Note If you’re using setuptools or pbr (see Chapter 5) for packaging, Sphinx extends them
to support the command setup.py build_sphinx, which will run sphinx-build auto-
matically. The pbr integration of Sphinx has some saner defaults, such as outputting
the documentation in the /doc subdirectory.
Your documentation begins with the index.rst file, but it doesn’t have to
end there: reST supports include directives to include reST files from other
reST files, so there’s nothing stopping you from dividing your documenta-
tion into multiple files. Don’t worry too much about syntax and semantics
to start; reST offers a lot of formatting possibilities, but you’ll have plenty of
time to dive into the reference later. The complete reference (http://docutils
.sourceforge.net/docs/ref/rst/restructuredtext.html) explains how to create titles,
bulleted lists, tables, and more.
Sphinx Modules
Sphinx is highly extensible: its basic functionality supports only manual
documentation, but it comes with a number of useful modules that enable
automatic documentation and other features. For example, sphinx.ext.autodoc
extracts reST-formatted docstrings from your modules and generates .rst files
for inclusion. This is one of the options sphinx-quickstart will ask if you want
to activate. If you didn’t select that option, however, you can still edit your
conf.py file and add it as an extension like so:
extensions = ['sphinx.ext.autodoc']
Note that autodoc will not automatically recognize and include your
modules. You need to explicitly indicate which modules you want docu-
mented by adding something like Listing 3-2 to one of your .rst files.
.. automodule:: foobar
u :members:
v :undoc-members:
w :show-inheritance:
36 Chapter 3
In Listing 3-2, we make three requests, all of which are optional: that
all documented members be printed u, that all undocumented members
be printed v, and that inheritance be shown w. Also note the following:
• If you don’t include any directives, Sphinx won’t generate any output.
• If you only specify :members:, undocumented nodes on your module,
class, or method tree will be skipped, even if all their members are doc-
umented. For example, if you document the methods of a class but not
the class itself, :members: will exclude both the class and its methods. To
keep this from happening, you’d have to write a docstring for the class
or specify :undoc-members: as well.
• Your module needs to be where Python can import it. Adding ., ..,
and/or ../.. to sys.path can help.
The autodoc extension gives you the power to include most of your docu-
mentation in your source code. You can even pick and choose which mod-
ules and methods to document—it’s not an “all-or-nothing” solution. By
maintaining your documentation directly alongside your source code, you
can easily ensure it stays up to date.
extensions = ['sphinx.ext.autosummary']
Then, you can add something like the following to an .rst file to auto-
matically generate a table of contents for the specified modules:
.. autosummary::
mymodule
mymodule.submodule
N o t e The sphinx-apidoccommand can automatically create these files for you; check out
the Sphinx documentation to find out more.
Document: index
---------------
1 items passed all tests:
1 tests in default
1 tests in 1 items.
1 passed and 0 failed.
Test passed.
Doctest summary
===============
1 test
0 failures in tests
0 failures in setup code
0 failures in cleanup code
build succeeded.
38 Chapter 3
When using the doctest builder, Sphinx reads the usual .rst files and
executes code examples that are contained in those files.
Sphinx also provides a bevy of other features, either out of the box or
through extension modules, including these:
You might not need all this functionality right away, but if you ever need
it in the future, it’s good to know about in advance. Again, check out the full
Sphinx documentation to find out more.
Note For other HTTP frameworks, such as Flask, Bottle, and Tornado, you can use
sphinxcontrib.httpdomain.
My point is that whenever you know you could extract information from
your code to build documentation, you should, and you should also auto-
mate the process. This is better than trying to maintain manually written
documentation, especially when you can leverage auto-publication tools
such as Read the Docs.
We’ll examine the sphinxcontrib-pecanwsme extension as an example of
writing your own Sphinx extension. The first step is to write a module—
preferably as a submodule of sphinxcontrib, as long as your module is generic
enough—and pick a name for it. Sphinx requires this module to have one
predefined function called setup(app), which contains the methods you’ll use
to connect your code to Sphinx events and directives. The full list of methods
is available in the Sphinx extension API at http://www.sphinx-doc.org/en/master/
extdev/appapi.html.
For example, the sphinxcontrib-pecanwsme extension includes a single
directive called rest-controller, added using the setup(app) function. This
added directive needs a fully qualified controller class name to generate
documentation for, as shown in Listing 3-3.
Note Even though Sphinx is written in Python and targets it by default, extensions are
available that allow it to support other languages as well. You can use Sphinx to
document your project in full, even if it uses multiple languages at once.
40 Chapter 3
whether another API is public or private and to name your own APIs. In
contrast to other languages, such as Java, Python does not enforce any
restriction on accessing code marked as private or public. The naming con-
ventions are just to facilitate understanding among programmers.
You should also make sure that you don’t remove the old interface right
away. I recommend keeping the old interface until it becomes too much
trouble to do so. If you have marked it as deprecated, users will know not to
use it.
Listing 3-4 is an example of good API change documentation for code
that provides a representation of a car object that can turn in any direction.
For whatever reason, the developers decided to retract the turn_left method
and instead provide a generic turn method that can take the direction as an
argument.
def turn_left(self):
"""Turn the car left.
.. deprecated:: 1.1
Use :func:`turn` instead with the direction argument set to left
"""
self.turn(direction='left')
The triple quotes here, """, indicate the start and end of the docstrings,
which will be pulled into the documentation when the user enters help(Car
.turn_left) into the terminal or extracts the documentation with an external
tool such as Sphinx. The deprecation of the car.turn_left method is indi-
cated by .. deprecated 1.1, where 1.1 refers to the first version released that
ships this code as deprecated.
Using this deprecation method and making it visible via Sphinx clearly
tells users that the function should not be used and gives them direct access
to the new function along with an explanation of how to migrate old code.
Figure 3-1 shows Sphinx documentation that explains some deprecated
functions.
42 Chapter 3
The downside of this approach is that it relies on developers reading
your changelog or documentation when they upgrade to a newer version
of your Python package. However, there is a solution for that: mark your
deprecated functions with the warnings module.
Note For those who work with C, this is a handy counterpart to the __attribute__
((deprecated)) GCC extension.
To go back to the car object example in Listing 3-4, we can use this to
warn users when they are attempting to call deprecated functions, as shown
in Listing 3-5.
import warnings
class Car(object):
def turn_left(self):
"""Turn the car left.
u .. deprecated:: 1.1
Use :func:`turn` instead with the direction argument set to "left".
"""
v warnings.warn("turn_left is deprecated; use turn instead",
DeprecationWarning)
self.turn(direction='left')
Listing 3-5: A documented change to the car object API using the warnings module
>>> Car().turn_left()
__main__:8: DeprecationWarning: turn_left is deprecated; use turn instead
Listing 3-6: Running Python with the -W error option and getting a deprecation error
class Car(object):
@moves.moved_method('turn', version='1.1')
def turn_left(self):
"""Turn the car left."""
return self.turn(direction='left')
def turn(self, direction):
"""Turn the car in some direction.
Here we’re using the moves() method from debtcollector, whose moved_
method decorator makes turn_left emit a DeprecationWarning whenever it’s called.
44 Chapter 3
Summary
Sphinx is the de facto standard for documenting Python projects. It sup-
ports a wide variety of syntax, and it is easy to add new syntax or features if
your project has particular needs. Sphinx can also automate tasks such as
generating indexes or extracting documentation from your code, making it
easy to maintain documentation in the long run.
Documenting changes to your API is critical, especially when you
deprecate functionality, so that users are not caught unawares. Ways to
document deprecations include the Sphinx deprecated keyword and the
warnings module, and the debtcollector library can automate maintaining
this documentation.
• How difficult will it be for users of the library to adapt their code?
Considering that there are people relying on your API, any change
you make has to be worth the effort needed to adopt it. This rule is
intended to prevent incompatible changes to the parts of the API that
are in common use. That said, one of the advantages of Python is that
it’s relatively easy to refactor code to adopt an API change.
• How easy will it be to maintain my API? Simplifying the implementa-
tion, cleaning up the codebase, making the API easier to use, having
more complete unit tests, making the API easier to understand at first
glance . . . all of these things will make your life as a maintainer easier.
• How can I keep my API consistent when applying a change? If all the
functions in your API follow a similar pattern (such as requiring the
same parameter in the first position), make sure new functions follow
that pattern as well. Also, doing too many things at once is a great way
to end up doing none of them right: keep your API focused on what it’s
meant to do.
• How will users benefit from the change? Last but not least, always con-
sider the users’ point of view.
46 Chapter 3
• Use docstrings to document classes and functions in your API. If you
follow the PEP 257 (https://www.python.org/dev/peps/pep-0257/) guide-
lines, developers won’t have to read your source to understand what your
API does. Generate HTML documentation from your docstrings—and
don’t limit it to the API reference.
• Give practical examples throughout. Have at least one “startup guide”
that will show newcomers how to build a working example. The first
page of the documentation should give a quick overview of your API’s
basic and representative use case.
• Document the evolution of your API in detail, version by version.
Version control system (VCS) logs are not enough!
• Make your documentation accessible and, if possible, comfortable to
read. Your users need to be able to find it easily and get the informa-
tion they need without feeling like they’re being tortured. Publishing
your documentation through PyPI is one way to achieve this; publish-
ing on Read the Docs is also a good idea, since users will expect to find
your documentation there.
• Finally, choose a theme that is both efficient and attractive. I chose the
“Cloud” Sphinx theme for WSME, but there are plenty of other themes
out there to choose from. You don’t have to be a web expert to produce
nice-looking documentation.
50 Chapter 4
>>> datetime.datetime.utcnow().tzinfo is None
True
We import the datetime library and define the datetime object as using
the UTC time zone. This returns a UTC timestamp whose values are year,
month, date, hours, minutes, seconds, and microseconds , respectively, in
the listing. We can check whether this object has time zone information by
checking the tzinfo object, and here we’re told that it doesn’t .
We then create the datetime object using the datetime.datetime.now()
method to retrieve the current date and time in the default time zone for
the region of the machine:
>>> datetime.datetime.now()
datetime.datetime(2018, 6, 15, 15, 24, 52, 276161)
This timestamp, too, is returned without any time zone, as we can tell
from the absence of the tzinfo field —if the time zone information had
been present, it would have appeared at the end of the output as something
like tzinfo=<UTC>.
The datetime API always returns unaware datetime objects by default,
and since there is no way for you to tell what the time zone is from the out-
put, these objects are pretty useless.
Armin Ronacher, creator of the Flask framework, suggests that an
application should always assume the unaware datetime objects in Python
are UTC. However, as we just saw, this doesn’t work for objects returned by
datetime.datetime.now(). When you are building datetime objects, I strongly
recommend that you always make sure they are time zone aware. That
ensures you can always compare your objects directly and check whether
they are returned correctly with the information you need. Let’s see how to
create time zone–aware timestamps using tzinfo objects.
You can also build your own datetime object with a particular date by pass-
ing the values you want for the different components of the day, as shown in
Listing 4-2.
As long as you know the name of the desired time zone, you can obtain
a tzinfo object that matches the time zone you target. The dateutil module
can access the time zone managed by the operating system, and if that
information is for some reason unavailable, will fall back on its own list of
embedded time zones. If you ever need to access this embedded list, you
can do so via the datetutil.zoneinfo module:
52 Chapter 4
['Africa/Abidjan', 'Africa/Accra', 'Africa/Addis_Ababa', 'Africa/Algiers', 'Africa/Asmara']
>>> len(zones)
592
In some cases, your program does not know which time zone it’s running
in, so you’ll need to determine it yourself. The datetutil.tz.gettz() function
will return the local time zone of your computer if you pass no argument to
it, as shown in Listing 4-4.
A class exists in Python that allows you to implement time zone classes your-
self: the datetime.tzinfo class is an abstract class that provides a base for
implementing classes representing time zones. If you ever want to implement
a class to represent a time zone, you need to use this as the parent class and
implement three different methods:
These three methods will embed a tzinfo object, allowing you to translate
any time zone–aware datetime to another time zone.
However, as mentioned, since time zone databases exist, it’s impractical
to implement those time zone classes oneself.
Listing 4-6: Using the iso8601 module to parse an ISO 8601–formatted timestamp
54 Chapter 4
In Listing 4-6, the iso8601 module is used to construct a datetime object
from a string. By calling iso8601.parse_date on a string containing an ISO
8601–formatted timestamp , the library is able to return a datetime object.
Since that string does not contain any time zone information, the iso8601
module assumes that the time zone is UTC. If a time zone contains correct
time zone information, the iso8601 module returns correctly.
Using time zone–aware datetime objects and using ISO 8601 as the format
for their string representation is a perfect solution for most problems around
time zone, making sure no mistakes are made and building great interoper-
ability between your application and the outside world.
Listing 4-7: A confusing timestamp, occurring during the daylight saving time crossover
On the night of October 30, 2017, Paris switched from summer to winter
time. The city switched at 3:00 am, when the time goes back to 2:00 am. If
we try to use a timestamp at 2:30 on that date, there is no way for this object
to be sure whether it is after or before the daylight saving time change.
However, it is possible to specify which side of the fold a timestamp is
on by using the fold attribute, added to datetime objects from Python 3.6 by
PEP 495 (Local Time Disambiguation—https://www.python.org/dev/peps/
pep-0495/). This attribute indicates which side of the fold the datetime is
on, as demonstrated in Listing 4-8.
Summary
In this chapter, we have seen how crucial it is to carry time zone informa-
tion in time stamps. The built-in datetime module is not complete in this
regard, but the dateutil module is a great complement: it allows us to get
tzinfo-compatible objects that are ready to be used. The dateutil module
also helps us solve subtle issues such as daylight saving time ambiguity.
The ISO 8601 standard format is an excellent choice for serializing and
unserializing timestamps because it is readily available in Python and com-
patible with any other programming language.
56 Chapter 4
5
Dis t r ibu t ing Your Sof t wa r e
#!/usr/bin/python
from distutils.core import setup
setup(name="rebuildd",
description="Debian packages rebuild tool",
author="Julien Danjou",
author_email="[email protected]",
url="http://julien.danjou.info/software/rebuildd.html",
packages=['rebuildd'])
With the setup.py file as the root of a project, all users have to do to build
or install your software is run that file with the appropriate command as its
argument. Even if your distribution includes C modules in addition to native
Python ones, distutils can handle them automatically.
Development of distutils was abandoned in 2000; since then, other
developers have picked up where it left off. One of the notable successors
is the packaging library known as setuptools, which offers more frequent
updates and advanced features, such as automatic dependency handling,
the Egg distribution format, and the easy_install command. Since distutils
was still the accepted means of packaging software included with the Python
Standard Library at the time of development, setuptools provided a degree
of backward compatibility with it. Listing 5-2 shows how you’d use setuptools
to build the same installation package as in Listing 5-1.
#!/usr/bin/env python
import setuptools
setuptools.setup(
name="rebuildd",
version="0.2",
author="Julien Danjou",
author_email="[email protected]",
description="Debian packages rebuild tool",
license="GPL",
url="http://julien.danjou.info/software/rebuildd/",
packages=['rebuildd'],
classifiers=[
"Development Status :: 2 - Pre-Alpha",
"Intended Audience :: Developers",
"Intended Audience :: Information Technology",
"License :: OSI Approved :: GNU General Public License (GPL)",
58 Chapter 5
"Operating System :: OS Independent",
"Programming Language :: Python"
],
)
• distutils is part of the Python Standard Library and can handle simple
package installations.
• setuptools, the standard for advanced package installations, was at first
deprecated but is now back in active development and the de facto
standard.
• distribute has been merged back into setuptools as of version 0.7;
distutils2 (aka packaging) has been abandoned.
• distlib might replace distutils in the future.
There are other packaging libraries out there, but these are the five
you’ll encounter the most. Be careful when researching these libraries on
the internet: plenty of documentation is outdated due to the complicated
history outlined above. The official documentation is up-to-date, however.
In short, setuptools is the distribution library to use for the time being,
but keep an eye out for distlib in the future.
import setuptools
setuptools.setup()
Two lines of code—it is that simple. The actual metadata the setup
requires is stored in setup.cfg, as in Listing 5-3.
[metadata]
name = foobar
author = Dave Null
author-email = [email protected]
license = MIT
long_description = file: README.rst
url = http://pypi.python.org/pypi/foobar
requires-python = >=2.6
classifiers =
Development Status :: 4 - Beta
Environment :: Console
Intended Audience :: Developers
Intended Audience :: Information Technology
License :: OSI Approved :: Apache Software License
Operating System :: OS Independent
Programming Language :: Python
As you can see, setup.cfg uses a format that’s easy to write and read,
directly inspired by distutils2. Many other tools, such as Sphinx or Wheel,
also read configuration from this setup.cfg file—that alone is a good argu-
ment to start using it.
In Listing 5-3, the description of the project is read from the README
.rst file. It’s good practice to always have a README file—preferably in the
RST format—so users can quickly understand what the project is about. With
just these basic setup.py and setup.cfg files, your package is ready to be pub-
lished and used by other developers and applications. The setuptools docu-
mentation provides more details if needed, for example, if you have some
extra steps in your installation process or want to include extra files.
Another useful packaging tool is pbr, short for Python Build Reasonableness.
The project was started in OpenStack as an extension of setuptools to
60 Chapter 5
facilitate installation and deployment of packages. The pbr packaging tool,
used alongside setuptools, implements features absent from setuptools, includ-
ing these:
And all this with little to no effort on your part. To use pbr, you just
need to enable it, as shown in Listing 5-4.
import setuptools
setuptools.setup(setup_requires=['pbr'], pbr=True)
running install_scripts
creating build/bdist.macosx-10.12-x86_64/wheel/daiquiri-1.3.0.dist-info/WHEEL
creating '/Users/jd/Source/daiquiri/dist/daiquiri-1.3.0-py2.py3-none-any.whl'
and adding '.' to it
adding 'daiquiri/__init__.py'
adding 'daiquiri/formatter.py'
adding 'daiquiri/handlers.py'
--snip--
$ python wheel-0.21.0-py2.py3-none-any.whl/wheel -h
usage: wheel [-h]
{keygen,sign,unsign,verify,unpack,install,install-
scripts,convert,help}
--snip--
positional arguments:
--snip--
python foobar.zip
62 Chapter 5
This is equivalent to:
In other words, the __main__ module for your program will be automati-
cally imported from __main__.py. You can also import __main__ from a mod-
ule you specify by appending a slash followed by the module name, just as
with Wheel:
python foobar.zip/mymod
One of the advantages of Wheel is that its naming conventions allow you
to specify whether your distribution is intended for a specific architecture
and/or Python implementation (CPython, PyPy, Jython, and so on). This is
particularly useful if you need to distribute modules written in C.
By default, Wheel packages are tied to the major version of Python that you
used to build them. When called with python2 setup.py bdist_wheel, the pattern
of a Wheel filename will be something like library-version-py2-none-any.whl.
If your code is compatible with all major Python versions (that is,
Python 2 and Python 3), you can build a universal Wheel:
The resulting filename will be different and contains both Python major
versions—something like library-version-py2.py3-none-any.whl. Building a uni-
versal Wheel avoids ending up with two different Wheels when only one would
cover both Python major versions.
If you don’t want to pass the --universal flag each time you are building
a Wheel, you can just add this to your setup.cfg file:
[wheel]
universal=1
--snip--
The sdist command creates a tarball under the dist directory of the
source tree. The tarball contains all the Python modules that are part of
the source tree. As seen in the previous section, you can also build Wheel
archives using the bdist_wheel command. Wheel archives are a bit faster to
install as they’re already in the correct format for installation.
The final step to make that code accessible is to export your package
somewhere users can install it via pip. That means publishing your project
to PyPI.
If it’s your first time exporting to PyPI, it pays to test out the publishing
process in a safe sandbox rather than on the production server. You can
use the PyPI staging server for this purpose; it replicates all the functional-
ity of the main index but is solely for testing purposes.
The first step is to register your project on the test server. Start by open-
ing your ~/.pypirc file and adding these lines:
[distutils]
index-servers =
testpypi
64 Chapter 5
[testpypi]
username = <your username>
password = <your password>
repository = https://testpypi.python.org/pypi
Save the file, and now you can register your project in the index:
This connects to the test PyPI server instance and creates a new entry.
Don’t forget to use the -r option; otherwise, the real production PyPI
instance would be used!
Obviously, if a project with the same name is already registered there,
the process will fail. Retry with a new name, and once you get your program
registered and receive the OK response, you can upload a source distribution
tarball, as shown in Listing 5-7.
--snip--
--snip--
creating build/bdist.linux-x86_64/wheel/ceilometer-2014.1.a6.g772e1a7
.dist-info/WHEEL
running upload
Submitting /home/jd/Source/ceilometer/dist/ceilometer-2014.1.a6
.g772e1a7-py27-none-any.whl to https://testpypi.python.org/pypi
Server response (200): OK
Once those operations are finished, you and other users can search for
the uploaded packages on the PyPI staging server, and even install those
packages using pip, by specifying the test server using the -i option:
If everything checks out, you can upload your project to the main PyPI
server. Just make sure to add your credentials and the details for the server
to your ~/.pypirc file first, like so:
[distutils]
index-servers =
pypi
testpypi
[pypi]
username = <your username>
password = <your password>
66 Chapter 5
[testpypi]
repository = https://testpypi.python.org/pypi
username = <your username>
password = <your password>
Now if you run register and upload with the -r pypi switch, your package
should be uploaded to PyPI.
NOTE PyPI can keep several versions of your software in its index, allowing you to install
specific and older versions, if you ever need to. Just pass the version number to the pip
install command; for example, pip install foobar==1.0.2.
Entry Points
You may have already used setuptools entry points without knowing any-
thing about them. Software distributed using setuptools includes important
metadata describing features such as its required dependencies and—more
relevantly to this topic—a list of entry points. Entry points are methods by
which other Python programs can discover the dynamic features a package
provides.
The following example shows how to provide an entry point named
rebuildd in the console_scripts entry point group:
#!/usr/bin/python
from distutils.core import setup
setup(name="rebuildd",
description="Debian packages rebuild tool",
author="Julien Danjou",
author_email="[email protected]",
url="http://julien.danjou.info/software/rebuildd.html",
entry_points={
'console_scripts': [
'rebuildd = rebuildd:main',
],
},
packages=['rebuildd'])
Any Python package can register entry points. Entry points are orga-
nized in groups: each group is made of a list of key and value pairs. Those
pairs use the format path.to.module:variable_name. In the previous example,
the key is rebuildd, and the value is rebuildd:main.
The list of entry points can be manipulated using various tools, from
setuptools to epi, as I’ll show here. In the following sections, we discuss how
we can use entry points to add extensibility to our software.
The output from epi group list in Listing 5-9 shows the different pack-
ages on a system that provide entry points. Each item in this table is the
name of an entry point group. Note that this list includes console_scripts,
which we’ll discuss shortly. We can use the epi command with the show com-
mand to show details of a particular entry point group, as in Listing 5-10.
68 Chapter 5
Using Console Scripts
When writing a Python application, you almost always have to provide a
launchable program—a Python script that the end user can run—that
needs to be installed inside a directory somewhere in the system path.
Most projects have a launchable program similar to this:
#!/usr/bin/python
import sys
import mysoftware
mysoftware.SomeClass(sys.argv).run()
• There’s no way the user can know where the Python interpreter is or
which version it uses.
• This script leaks binary code that can’t be imported by software or
unit tests.
• There’s no easy way to define where to install this script.
• It’s not obvious how to install this in a portable way (for example, on
both Unix and Windows).
def main():
print("Client started")
And in foobar/server.py:
def main():
print("Server started")
setup(
name="foobar",
version="1",
description="Foo!",
author="Julien Danjou",
author_email="[email protected]",
packages=["foobar"],
entry_points={
"console_scripts": [
"foobard = foobar.server:main",
"foobar = foobar.client:main",
],
},
)
#!/usr/bin/python
# EASY-INSTALL-ENTRY-SCRIPT: 'foobar==1','console_scripts','foobar'
__requires__ = 'foobar==1'
import sys
from pkg_resources import load_entry_point
if __name__ == '__main__':
sys.exit(
load_entry_point('foobar==1', 'console_scripts', 'foobar')()
)
This code scans the entry points of the foobar package and retrieves the
foobar key from the console_scripts group, which is used to locate and run
the corresponding function. The return value of the load_entry_point will
then be a reference to the function foobar.client.main, which will be called
without any arguments and whose return value will be used as an exit code.
Notice that this code uses pkg_resources to discover and load entry point
files from within your Python programs.
NOTE If you’re using pbr on top of setuptools, the generated script is simpler (and therefore
faster) than the default one built by setuptools, as it will call the function you wrote in
the entry point without having to search the entry point list dynamically at runtime.
70 Chapter 5
Using Plugins and Drivers
Entry points make it easy to discover and dynamically load code deployed
by other packages, but this is not their only use. Any application can pro-
pose and register entry points and groups and then use them as it wishes.
In this section, we’re going to create a cron-style daemon pycrond that
will allow any Python program to register a command to be run once
every few seconds by registering an entry point in the group pytimed. The
attribute indicated by this entry point should be an object that returns
number_of_seconds, callable.
Here’s our implementation of pycrond using pkg_resources to discover
entry points, in a program I’ve named pytimed.py:
import pkg_resources
import time
def main():
seconds_passed = 0
while True:
for entry_point in pkg_resources.iter_entry_points('pytimed'):
try:
seconds, callable = entry_point.load()()
except:
# Ignore failure
pass
else:
if seconds_passed % seconds == 0:
callable()
time.sleep(1)
seconds_passed += 1
This program consists of an infinite loop that iterates over each entry
point of the pytimed group. Each entry point is loaded using the load()
method. The program then calls the returned method, which needs to
return the number of seconds to wait before calling the callable as well as
the aforementioned callable.
The program in pytimed.py is a very simplistic and naive implementa-
tion, but it is sufficient for our example. Now we can write another Python
program, named hello.py, that needs one of its functions called on a periodic
basis:
def print_hello():
print("Hello, world!")
def say_hello():
return 2, print_hello
setup(
name="hello",
version="1",
packages=["hello"],
entry_points={
"pytimed": [
"hello = hello:say_hello",
],
},)
The setup.py script registers an entry point in the group pytimed with the
key hello and the value pointing to the function hello.say_hello. Once that
package is installed using that setup.py—for example, using pip install—the
pytimed script can detect the newly added entry point.
At startup, pytimed will scan the group pytimed and find the key hello. It
will then call the hello.say_hello function, getting two values: the number of
seconds to wait between each call and the function to call, 2 seconds and
print_hello in this case. By running the program, as we do in Listing 5-12,
you can see “Hello, world!” printed on the screen every 2 seconds.
The possibilities this mechanism offers are immense: you can build
driver systems, hook systems, and extensions easily and generically. Imple
menting this mechanism by hand in every program you make would be
tedious, but fortunately, there’s a Python library that can take care of the
boring parts for us.
The stevedore library provides support for dynamic plugins based on
the same mechanism demonstrated in our previous examples. The use
case in this example is already simplistic, but we can still simplify it fur-
ther in this script, pytimed_stevedore.py:
def main():
seconds_passed = 0
extensions = ExtensionManager('pytimed', invoke_on_load=True)
while True:
for extension in extensions:
try:
seconds, callable = extension.obj
72 Chapter 5
except:
# Ignore failure
pass
else:
if seconds_passed % seconds == 0:
callable()
time.sleep(1)
seconds_passed += 1
def main(name):
seconds_passed = 0
seconds, callable = DriverManager('pytimed', name, invoke_on_load=True).
driver
while True:
if seconds_passed % seconds == 0:
callable()
time.sleep(1)
seconds_passed += 1
main("hello")
Listing 5-13: Using stevedore to run a single extension from an entry point
In this case, only one extension is loaded and selected by name. This
allows us to quickly build a driver system in which only one extension is
loaded and used by a program.
Summary
The packaging ecosystem in Python has a bumpy history; however, the situ-
ation is now settling. The setuptools library provides a complete solution to
packaging, not only to transport your code in different formats and upload
it to PyPI, but also to handle connection with other software and libraries
via entry points.
PEP 426, which defines a new metadata format for Python packages, is
still fairly recent and not yet approved. How do you think it will tackle
current packaging problems?
PEP 426 originally started as part of the Wheel format definition, but
Daniel Holth realized that Wheel could work with the existing metadata
format defined by setuptools. PEP 426 is thus a consolidation of the exist-
ing setuptools metadata with some of the ideas from distutils2 and other
packaging systems (such as RPM and npm). It addresses some of the frustra-
tions encountered with existing tools (for example, with cleanly separat-
ing different kinds of dependencies).
The main gains will be a REST API on PyPI offering full metadata
access, as well as (hopefully) the ability to automatically generate distri-
bution policy–compliant packages from upstream metadata.
The Wheel format is somewhat recent and not widely used yet, but it seems
promising. Why is it not part of the Standard Library?
It turns out the Standard Library is not really a suitable place for
packaging standards: it evolves too slowly, and an addition to a later
version of the Standard Library cannot be used with earlier versions
of Python. So, at the Python language summit earlier this year, we
tweaked the PEP process to allow distutils-sig to manage the full
approval cycle for packaging-related PEPs, and python-dev will only be
involved for proposals that involve changing CPython directly (such
as pip bootstrapping).
74 Chapter 5
6
Unit Testing
def test_true():
assert True
This will simply assert that the behavior of the program is what you
expect. To run this test, you need to load the test_true.py file and run the
test_true() function defined within.
However, writing and running an individual test for each of your test
files and functions would be a pain. For small projects with simple usage,
the pytest package comes to the rescue—once installed via pip, pytest pro-
vides the pytest command, which loads every file whose name starts with
test_ and then executes all functions within that start with test_.
With just the test_true.py file in our source tree, running pytest gives us
the following output:
$ pytest -v test_true.py
========================== test session starts ===========================
platform darwin -- Python 3.6.4, pytest-3.3.2, py-1.5.2, pluggy-0.6.0 --
/usr/local/opt/python/bin/python3.6
cachedir: .cache
rootdir: examples, inifile:
collected 1 item
76 Chapter 6
The -v option tells pytest to be verbose and print the name of each
test run on a separate line. If a test fails, the output changes to indicate
the failure, accompanied by the whole traceback.
Let’s add a failing test this time, as shown in Listing 6-2.
def test_false():
assert False
$ pytest -v test_true.py
========================== test session starts ===========================
platform darwin -- Python 3.6.4, pytest-3.3.2, py-1.5.2, pluggy-0.6.0 -- /usr/
local/opt/python/bin/python3.6
cachedir: .cache
rootdir: examples, inifile:
collected 2 items
def test_false():
> assert False
E assert False
test_true.py:5: AssertionError
=================== 1 failed, 1 passed in 0.07 seconds ===================
def test_key():
a = ['a', 'b']
b = ['b']
assert a == b
$ pytest test_true.py
========================== test session starts ===========================
Unit Testing 77
platform darwin -- Python 3.6.4, pytest-3.3.2, py-1.5.2, pluggy-0.6.0
rootdir: /Users/jd/Source/python-book/examples, inifile:
plugins: celery-4.1.0
collected 1 item
test_true.py F [100%]
def test_key():
a = ['a', 'b']
b = ['b']
> assert a == b
E AssertionError: assert ['a', 'b'] == ['b']
E At index 0 diff: 'a' != 'b'
E Left contains more items, first extra item: 'b'
E Use -v to get the full diff
test_true.py:10: AssertionError
======================== 1 failed in 0.07 seconds ========================
This tells us that a and b are different and that this test does not pass.
It also tells us exactly how they are different, making it easy to fix the test
or code.
Skipping Tests
If a test cannot be run, you will probably want to skip that test—for example,
you may wish to run a test conditionally based on the presence or absence
of a particular library. To that end, you can use the pytest.skip() func-
tion, which will mark the test as skipped and move on to the next one. The
pytest.mark.skip decorator skips the decorated test function unconditionally,
so you’ll use it when a test always needs to be skipped. Listing 6-3 shows how
to skip a test using these methods.
import pytest
try:
import mylib
except ImportError:
mylib = None
78 Chapter 6
def test_skip_at_runtime():
if True:
pytest.skip("Finally I don't want to run it")
$ pytest -v examples/test_skip.py
========================== test session starts ===========================
platform darwin -- Python 3.6.4, pytest-3.3.2, py-1.5.2, pluggy-0.6.0 -- /usr/
local/opt/python/bin/python3.6
cachedir: .cache
rootdir: examples, inifile:
collected 3 items
examples/test_skip.py::test_fail SKIPPED
[ 33%]
examples/test_skip.py::test_mylib SKIPPED
[ 66%]
examples/test_skip.py::test_skip_at_runtime SKIPPED
[100%]
The output of the test run in Listing 6-3 indicates that, in this case,
all the tests have been skipped. This information allows you to ensure you
didn’t accidentally skip a test you expected to run.
examples/test_skip.py::test_fail SKIPPED
[100%]
Unit Testing 79
=== 2 tests deselected ===
=== 1 skipped, 2 deselected in 0.04 seconds ===
Names are not always the best way to filter which tests will run. Commonly,
a developer would group tests by functionalities or types instead. Pytest
provides a dynamic marking system that allows you to mark tests with a key-
word that can be used as a filter. To mark tests in this way, use the -m option.
If we set up a couple of tests like this:
import pytest
@pytest.mark.dicttest
def test_something():
a = ['a', 'b']
assert a == a
def test_something_else():
assert False
we can use the -m argument with pytest to run only one of those tests:
test_mark.py::test_something PASSED
[100%]
The -m marker accepts more complex queries, so we can also run all
tests that are not marked:
test_mark.py F
[100%]
80 Chapter 6
def test_something_else():
> assert False
E assert False
test_mark.py:10: AssertionError
=== 1 tests deselected ===
=== 1 failed, 1 deselected in 0.07 seconds ===
Here pytest executed every test that was not marked as dicttest—in this
case, the test_something_else test, which failed. The remaining marked test,
test_something, was not executed and so is listed as deselected.
Pytest accepts complex expressions composed of the or, and, and not key-
words, allowing you to do more advanced filtering.
Unit Testing 81
Here’s a simple fixture:
import pytest
@pytest.fixture
def database():
return <some database connection>
def test_insert(database):
database.insert(123)
The database fixture is automatically used by any test that has database
in its argument list. The test_insert() function will receive the result of
the database() function as its first argument and use that result as it wants.
When we use a fixture this way, we don’t need to repeat the database initial-
ization code several times.
Another common feature of code testing is tearing down after a test
has used a fixture. For example, you may need to close a database connec-
tion. Implementing the fixture as a generator allows us to add teardown
functionality, as shown in Listing 6-5.
import pytest
@pytest.fixture
def database():
db = <some database connection>
yield db
db.close()
def test_insert(database):
database.insert(123)
Because we used the yield keyword and made database a generator, the
code after the yield statement runs when the test is done. That code will
close the database connection at the end of the test.
However, closing a database connection for each test might impose an
unnecessary runtime cost, as tests may be able to reuse that same connec-
tion. In that case, you can pass the scope argument to the fixture decorator,
specifying the scope of the fixture:
import pytest
@pytest.fixture(scope="module")
def database():
db = <some database connection>
yield db
db.close()
def test_insert(database):
database.insert(123)
82 Chapter 6
By specifying the scope="module" parameter, you initialize the fixture
once for the whole module, and the same database connection will be
passed to all test functions requesting a database connection.
Finally, you can run some common code before and after your tests by
marking fixtures as automatically used with the autouse keyword, rather than
specifying them as an argument for each of the test functions. Specifying
the autouse=True keyword argument to the pytest.fixture() function will
make sure the fixture is called before running any test in the module or
class it is defined in, as in this example:
import os
import pytest
@pytest.fixture(autouse=True)
def change_user_env():
curuser = os.environ.get("USER")
os.environ["USER"] = "foobar"
yield
os.environ["USER"] = curuser
def test_user():
assert os.getenv("USER") == "foobar"
Such automatically enabled features are handy, but make sure not to
abuse fixtures: they are run before each and every test covered by their
scope, so they can slow down a test run significantly.
import pytest
import myapp
Unit Testing 83
@pytest.fixture(params=["mysql", "postgresql"])
def database(request):
d = myapp.driver(request.param)
d.start()
yield d
d.stop()
def test_insert(database):
database.insert("somedata")
try:
from unittest import mock
except ImportError:
import mock
84 Chapter 6
>>> m.some_attribute
"hello world"
In just a few lines, your mock.Mock object now has a some_method() method
that returns 42. It accepts any kind of argument, and there is no check on
what the values are—yet.
Dynamically created methods can also have (intentional) side effects.
Rather than being boilerplate methods that just return a value, they can be
defined to execute useful code.
Listing 6-9 creates a fake method that has the side effect of printing the
"hello world" string.
Unit Testing 85
The mock library uses the action/assertion pattern: this means that once
your test has run, it’s up to you to check that the actions you are mocking
were correctly executed. Listing 6-10 applies the assert() method to our
mock objects to perform these checks.
We create a method with the arguments foo and bar to stand in as our
tests by calling the method . The usual way to check calls to a mock object
is to use the assert_called() methods, such as assert_called_once_with() .
To these methods, you need to pass the values that you expect callers to use
when calling your mock method. If the values passed are not the ones being
used, then mock raises an AssertionError. If you don’t know what arguments
may be passed, you can use mock.ANY as a value ; that will match any argu-
ment passed to your mock method.
Th mock library can also be used to patch some function, method, or
object from an external module. In Listing 6-11, we replace the os.unlink()
function with a fake function we provide.
86 Chapter 6
When used as a context manager, mock.patch() replaces the target func-
tion with the function we provide so the code executed inside the context
uses that patched method. With the mock.patch() method, it’s possible to
change any part of an external piece of code, making it behave in a way
that lets you test all conditions in your application, as shown in Listing 6-12.
import pytest
import requests
class WhereIsPythonError(Exception):
pass
def is_python_still_a_programming_language():
try:
r = requests.get("http://python.org")
except IOError:
pass
else:
if r.status_code == 200:
return 'Python is a programming language' in r.content
raise WhereIsPythonError("Something bad happened")
def fake_get(url):
return m
return fake_get
def raise_get(url):
raise IOError("Unable to fetch url %s" % url)
@mock.patch('requests.get', get_fake_get(
200, 'Python is a programming language for sure'))
def test_python_is():
assert is_python_still_a_programming_language() is True
@mock.patch('requests.get', get_fake_get(
200, 'Python is no more a programming language'))
def test_python_is_not():
assert is_python_still_a_programming_language() is False
@mock.patch('requests.get', raise_get)
def test_ioerror():
Unit Testing 87
with pytest.raises(WhereIsPythonError):
is_python_still_a_programming_language()
Listing 6-12 implements a test suite that searches for all instances of the
string “Python is a programming language” on the http://python.org/ web
page . There is no way to test negative scenarios (where this sentence is not
on the web page) without modifying the page itself—something we’re not
able to do, obviously. In this case, we’re using mock to cheat and change the
behavior of the request so it returns a mocked reply with a fake page that
doesn’t contain that string. This allows us to test the negative scenario in
which http://python.org/ does not contain this sentence, making sure the pro-
gram handles that case correctly.
This example uses the decorator version of mock.patch() . Using the
decorator does not change the mocking behavior, but it is simpler when you
need to use mocking within the context of an entire test function.
Using mocking, we can simulate any problem, such as a web server
returning a 404 error, an I/O error, or a network latency issue. We can
make sure code returns the correct values or raises the correct exception
in every case, ensuring our code always behaves as expected.
Note The command may also be named python-coverage, if you install coverage through
your operating system installation software. This is the case on Debian, for example.
88 Chapter 6
When using pytest, just install the pytest-cov plugin via pip install
pytest-pycov and add a few option switches to generate a detailed code
coverage output, as shown in Listing 6-13.
The --cov option enables the coverage report at the end of the test run.
You need to pass the package name as an argument for the plugin to filter
the coverage report properly. The output includes the lines of code that
were not run and therefore have no tests. All you need to do now is spawn
your favorite text editor and start writing tests for that code.
However, coverage goes one better, allowing you to generate clear
HTML reports. Simply add the --cov-report=html flag, and the htmlcov
directory from which you ran the command will be populated with
HTML pages. Each page will show you which parts of your source code
were or were not run.
If you want to be that person, you can use the option --cover-fail
-under=COVER_MIN_PERCENTAGE, which will make the test suite fail if a mini-
mum percentage of the code is not executed when the test suite is run.
While having a good coverage percentage is a decent goal, and while the
tool is useful to gain insight into the state of your test coverage, defining
an arbitrary percentage value does not provide much insight. Figure 6-1
shows an example of a coverage report with the percentage at the top.
For example, a code coverage score of 100 percent is a respectable goal,
but it does not necessarily mean the code is entirely tested and you can rest.
It only proves that your whole code path has been run; there is no indica-
tion that every possible condition has been tested.
You should use coverage information to consolidate your test suite and
add tests for any code that is currently not being run. This facilitates later
project maintenance and increases your code’s overall quality.
Unit Testing 89
Figure 6-1: Coverage of ceilometer.publisher
Virtual Environments
Earlier we mentioned the danger that your tests may not capture the
absence of dependencies. Any application of significant size inevitably
depends on external libraries to provide features the application needs,
but there are many ways external libraries might cause issues on your
operating system. Here are a few:
• Your system does not have the library you need packaged.
• Your system does not have the right version of the library you need
packaged.
• You need two different versions of the same library for two different
applications.
These problems can happen when you first deploy your application or
later on, while it’s running. Upgrading a Python library installed via your
90 Chapter 6
system manager might break your application in a snap without warning,
for reasons as simple as an API change in the library being used by the
application.
The solution is for each application to use a library directory that con-
tains all the application’s dependencies. This directory is then used to load
the needed Python modules rather than the system-installed ones.
Such a directory is known as a virtual environment.
Once run, venv creates a lib/pythonX.Y directory and uses it to install pip
into the virtual environment, which will be useful to install further Python
packages.
You can then activate the virtual environment by “sourcing” the activate
command. Use the following on Posix systems:
$ source myvenv/bin/activate
> \myvenv\Scripts\activate
Once you do that, your shell prompt should appear prefixed by the
name of your virtual environment. Executing python will call the version of
Python that has been copied into the virtual environment. You can check
that it’s working by reading the sys.path variable and checking that it has
your virtual environment directory as its first component.
You can stop and leave the virtual environment at any time by calling
the deactivate command:
$ deactivate
That’s it. Also note that you are not forced to run activate if you want to
use the Python installed in your virtual environment just once. Calling the
python binary will also work:
$ myvenv/bin/python
Unit Testing 91
Now, while we’re in our activated virtual environment, we do not have
access to any of the modules installed and available on the main system.
That is the point of using a virtual environment, but it does mean we prob-
ably need to install the packages we need. To do that, use the standard pip
command to install each package, and the packages will install in the right
place, without changing anything about your system:
$ source myvenv/bin/activate
(myvenv) $ pip install six
Downloading/unpacking six
Downloading six-1.4.1.tar.gz
Running setup.py egg_info for package six
Voilà! We can install all the libraries we need and then run our applica-
tion from this virtual environment, without breaking our system. It’s easy to
see how we can script this to automate the installation of a virtual environ-
ment based on a list of dependencies, as in Listing 6-14.
virtualenv myappvenv
source myappvenv/bin/activate
pip install -r requirements.txt
deactivate
92 Chapter 6
One way to ensure you’re accounting for all the dependencies would be
to write a script to deploy a virtual environment, install setuptools, and then
install all of the dependencies required for both your application/library
runtime and unit tests. Luckily, this is such a popular use case that an appli-
cation dedicated to this task has already been built: tox.
The tox management tool aims to automate and standardize how tests
are run in Python. To that end, it provides everything needed to run an
entire test suite in a clean virtual environment, while also installing your
application to check that the installation works.
Before using tox, you need to provide a configuration file named tox.ini
that should be placed in the root directory of your project, beside your
setup.py file:
$ touch tox.ini
% tox
GLOB sdist-make: /home/jd/project/setup.py
python create: /home/jd/project/.tox/python
python inst: /home/jd/project/.tox/dist/project-1.zip
____________________ summary _____________________
python: commands succeeded
congratulations :)
[testenv]
commands=pytest
Now tox runs the command pytest. However, since we do not have pytest
installed in the virtual environment, this command will likely fail. We need
to list pytest as a dependency to be installed:
[testenv]
deps=pytest
commands=pytest
When run now, tox re-creates the environment, installs the new depen-
dency, and runs the command pytest, which executes all of the unit tests.
To add more dependencies, you can either list them in the deps configura-
tion option, as is done here, or use the -rfile syntax to read from a file.
Unit Testing 93
Re-creating an Environment
Sometimes you’ll need to re-create an environment to, for example, ensure
things work as expected when a new developer clones the source code
repository and runs tox for the first time. For this, tox accepts a --recreate
option that will rebuild the virtual environment from scratch based on
parameters you lay out.
You define the parameters for all virtual environments managed by tox
in the [testenv] section of tox.ini. And, as mentioned, tox can manage mul-
tiple Python virtual environments—indeed, it is possible to run our tests
under a Python version other than the default one by passing the -e flag to
tox, like so:
% tox -e py26
GLOB sdist-make: /home/jd/project/setup.py
py26 create: /home/jd/project/.tox/py26
py26 installdeps: nose
py26 inst: /home/jd/project/.tox/dist/rebuildd-1.zip
py26 runtests: commands[0] | pytests
--snip--
== test session starts ==
=== 5 passed in 4.87 seconds ====
[testenv]
deps=pytest
commands=pytest
[testenv:py36-coverage]
deps={[testenv]deps}
pytest-cov
commands=pytest --cov=myproject
94 Chapter 6
Using Different Python Versions
We can also create a new environment with an unsupported version of
Python right away with the following in tox.ini:
[testenv]
deps=pytest
commands=pytest
[testenv:py21]
basepython=python2.1
When we run this, it will now (attempt to) use Python 2.1 to run the
test suite—although since it is very unlikely you have this ancient Python
version installed on your system, I doubt this would work for you!
It’s likely that you’ll want to support multiple Python versions, in which
case it would be useful to have tox run all the tests for all the Python ver-
sions you want to support by default. You can do this by specifying the envi-
ronment list you want to use when tox is run without arguments:
[tox]
envlist=py35,py36,pypy
[testenv]
deps=pytest
commands=pytest
When tox is launched without any further arguments, all four environ-
ments listed are created, populated with the dependencies and the applica-
tion, and then run with the command pytest.
[tox]
envlist=py35,py36,pypy,pep8
[testenv]
deps=pytest
commands=pytest
[testenv:pep8]
deps=flake8
commands=flake8
In this case, the pep8 environment is run using the default version of
Python, which is probably fine, though you can still specify the basepython
option if you want to change that.
Unit Testing 95
When running tox, you’ll notice that all the environments are built
and run sequentially. This can make the process very long, but since virtual
environments are isolated, nothing prevents you from running tox com-
mands in parallel. This is exactly what the detox package does, by providing
a detox command that runs all of the default environments from envlist in
parallel. You should pip install it!
Testing Policy
Embedding testing code in your project is an excellent idea, but how that
code is run is also extremely important. Too many projects have test code
lying around that fails to run for some reason or other. This topic is not
strictly limited to Python, but I consider it important enough to emphasize
here: you should have a zero-tolerance policy regarding untested code. No
code should be merged without a proper set of unit tests to cover it.
The minimum you should aim for is that each of the commits you push
passes all the tests. Automating this process is even better. For example,
OpenStack relies on a specific workflow based on Gerrit (a web-based code
review service) and Zuul (a continuous integration and delivery service). Each
commit pushed goes through the code review system provided by Gerrit, and
Zuul is in charge of running a set of testing jobs. Zuul runs the unit tests and
various higher-level functional tests for each project. This code review, which
is executed by a couple of developers, makes sure all code committed has
associated unit tests.
If you’re using the popular GitHub hosting service, Travis CI is a
tool that allows you to run tests after each push or merge or against pull
requests that are submitted. While it is unfortunate that this testing is done
post-push, it’s still a fantastic way to track regressions. Travis supports all
significant Python versions out of the box, and it can be customized signifi-
cantly. Once you’ve activated Travis on your project via the web interface
at https://www.travis-ci.org/, just add a .travis.yml file that will determine how
the tests are run. Listing 6-15 shows an example of a .travis.yml file.
language: python
python:
- "2.7"
- "3.6"
# command to install dependencies
install: "pip install -r requirements.txt --use-mirrors"
# command to run tests
script: pytest
With this file in place in your code repository and Travis enabled, the
latter will spawn a set of jobs to test your code with the associated unit tests.
It’s easy to see how you can customize this by simply adding dependencies
and tests. Travis is a paid service, but the good news is that for open source
projects, it’s entirely free!
96 Chapter 6
The tox-travis package (https://pypi.python.org/pypi/tox-travis/) is also
worth looking into, as it will polish the integration between tox and Travis
by running the correct tox target depending on the Travis environment
being used. Listing 6-16 shows an example of a .travis.yml file that will
install tox-travis before running tox.
sudo: false
language: python
python:
- "2.7"
- "3.4"
install: pip install tox-travis
script: tox
Using tox-travis, you can simply call tox as the script on Travis, and
it will call tox with the environment you specify here in the .travis.yml file,
building the necessary virtual environment, installing the dependency, and
running the commands you specified in tox.ini. This makes it easy to use the
same workflow both on your local development machine and on the Travis
continuous integration platform.
These days, wherever your code is hosted, it is always possible to apply
some automatic testing of your software and to make sure your project is
moving forward, not being held back by the addition of bugs.
Unit Testing 97
Where the cost of testing is very high and the returns are very low,
I think it’s fine to make an informed decision not to test, but that situa-
tion is relatively rare: most things can be tested reasonably cheaply, and
the benefit of catching errors early is usually quite high.
What are the best strategies when writing Python code to make testing
manageable and improve the quality of the code?
Separate out concerns and don’t do multiple things in one place; this
makes reuse natural, and that makes it easier to put test doubles in
place. Take a purely functional approach when possible; for example,
in a single method either calculate something or change some state,
but avoid doing both. That way you can test all of the calculating
behaviors without dealing with state changes, such as writing to a
database or talking to an HTTP server. The benefit works the other
way around too—you can replace the calculation logic for tests to pro-
voke corner case behavior and use mocks and test doubles to check
that the expected state propagation happens as desired. The most
heinous things to test are deeply layered stacks with complex cross-
layer behavioral dependencies. There you want to evolve the code so
that the contract between layers is simple, predictable, and—most use-
fully for testing—replaceable.
98 Chapter 6
7
Me t hods a nd Decor ators
Creating Decorators
The odds are good that you’ve already used decorators to make your own
wrapper functions. The dullest possible decorator, and the simplest example,
is the identity() function, which does nothing except return the original
function. Here is its definition:
def identity(f):
return f
@identity
def foo():
return 'bar'
You enter the name of the decorator preceded by an @ symbol and then
enter the function you want to use it on. This is the same as writing the
following:
def foo():
return 'bar'
foo = identity(foo)
This decorator is useless, but it works. Let’s look at another, more useful
example in Listing 7-1.
_functions = {}
def register(f):
global _functions
_functions[f.__name__] = f
return f
@register
def foo():
return 'bar'
100 Chapter 7
In Listing 7-1, the register decorator stores the decorated function
name into a dictionary. The _functions dictionary can then be used and
accessed using the function name to retrieve a function: _functions['foo']
points to the foo() function.
In the following sections, I will explain how to write your own decora-
tors. Then I’ll cover how the built-in decorators provided by Python work
and explain how (and when) to use them.
Writing Decorators
As mentioned, decorators are often used when refactoring repeated code
around functions. Consider the following set of functions that need to check
whether the username they receive as an argument is the admin or not and,
if the user is not an admin, raise an exception:
class Store(object):
def get_food(self, username, food):
if username != 'admin':
raise Exception("This user is not allowed to get food")
return self.storage.get(food)
We can see there’s some repeated code here. The obvious first step to
making this code more efficient is to factor the code that checks for admin
status:
u def check_is_admin(username):
if username != 'admin':
raise Exception("This user is not allowed to get or put food")
class Store(object):
def get_food(self, username, food):
check_is_admin(username)
return self.storage.get(food)
We’ve moved the checking code into its own function u. Now our code
looks a bit cleaner, but we can do even better if we use a decorator, as shown
in Listing 7-2.
def check_is_admin(f):
u def wrapper(*args, **kwargs):
if kwargs.get('username') != 'admin':
raise Exception("This user is not allowed to get or put food")
return f(*args, **kwargs)
class Store(object):
@check_is_admin
def get_food(self, username, food):
return self.storage.get(food)
@check_is_admin
def put_food(self, username, food):
self.storage.put(food)
Stacking Decorators
You can also use several decorators on top of a single function or method,
as shown in Listing 7-3.
def check_user_is_not(username):
def user_check_decorator(f):
def wrapper(*args, **kwargs):
if kwargs.get('username') == username:
raise Exception("This user is not allowed to get food")
return f(*args, **kwargs)
return wrapper
return user_check_decorator
class Store(object):
@check_user_is_not("admin")
@check_user_is_not("user123")
def get_food(self, username, food):
return self.storage.get(food)
Listing 7-3: Using more than one decorator with a single function
102 Chapter 7
class Store(object):
def get_food(self, username, food):
return self.storage.get(food)
Store.get_food = check_user_is_not("user123")(Store.get_food)
Store.get_food = check_user_is_not("admin")(Store.get_food)
import uuid
def set_class_name_and_id(klass):
klass.name = str(klass)
klass.random_id = uuid.uuid4()
return klass
@set_class_name_and_id
class SomeClass(object):
pass
When the class is loaded and defined, it will set the name and random_id
attributes, like so:
>>> SomeClass.name
"<class '__main__.SomeClass'>"
>>> SomeClass.random_id
UUID('d244dc42-f0ca-451c-9670-732dc32417cd')
class CountCalls(object):
def __init__(self, f):
self.f = f
self.called = 0
@CountCalls
def print_hello():
print("hello")
We can then use this to check how many times the function print_hello()
has been called:
>>> print_hello.called
0
>>> print_hello()
hello
>>> print_hello.called
1
Listing 7-4: A decorated function loses its docstring and name attributes.
104 Chapter 7
Not having the correct docstring and name attribute for a function can
be problematic in various situations, such as when generating the source
code documentation.
Fortunately, the functools module in the Python Standard Library solves
this problem with the update_wrapper() function, which copies the attributes
from the original function that were lost to the wrapper itself. The source
code of update_wrapper() is shown in Listing 7-5.
Now the foobar() function has the correct name and docstring even
when decorated by is_admin.
import functools
def check_is_admin(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
if kwargs.get('username') != 'admin':
raise Exception("This user is not allowed to get food")
return f(*args, **kwargs)
return wrapper
class Store(object):
@check_is_admin
def get_food(self, username, food):
"""Get food from storage."""
return self.storage.get(food)
import functools
import inspect
def check_is_admin(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
func_args = inspect.getcallargs(f, *args, **kwargs)
if func_args.get('username') != 'admin':
raise Exception("This user is not allowed to get food")
return f(*args, **kwargs)
return wrapper
106 Chapter 7
@check_is_admin
def get_food(username, type='chocolate'):
return type + " nom nom nom!"
Listing 7-7: Using tools from the inspect module to extract information
>>> Pizza.get_size()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: get_size() missing 1 required positional argument: 'self'
Python complains that we have not provided the necessary self argu-
ment. Indeed, as it is not bound to any object, the self argument cannot be
set automatically. However, we are able to use the get_size() function not
>>> Pizza.get_size(Pizza(42))
42
This call works, just as promised. It is, however, not very convenient: we
have to refer to the class every time we want to call one of its methods.
So Python goes the extra mile for us by binding a class’s methods
to its instances. In other words, we can access get_size() from any Pizza
instance, and, better still, Python will automatically pass the object itself
to the method’s self parameter, like so:
>>> Pizza(42).get_size
<bound method Pizza.get_size of <__main__.Pizza object at 0x7f3138827910>>
>>> Pizza(42).get_size()
42
>>> m = Pizza(42).get_size
>>> m()
42
As long as you have a reference to the bound method, you do not even
have to keep a reference to your Pizza object. Moreover, if you have a refer-
ence to a method but you want to find out which object it is bound to, you
can just check the method’s __self__ property, like so:
>>> m = Pizza(42).get_size
>>> m.__self__
<__main__.Pizza object at 0x7f3138827910>
>>> m == m.__self__.get_size
True
Static Methods
Static methods belong to a class, rather than an instance of a class, so they
don’t actually operate on or affect class instances. Instead, a static method
operates on the parameters it takes. Static methods are generally used to
create utility functions, because they do not depend on the state of the
class or its objects.
108 Chapter 7
For example, in Listing 7-8, the static mix_ingredients() method belongs
to the Pizza class but could actually be used to mix ingredients for any
other food.
class Pizza(object):
@staticmethod
def mix_ingredients(x, y):
return x + y
def cook(self):
return self.mix_ingredients(self.cheese, self.vegetables)
Class Methods
Class methods are bound to a class rather than its instances. That means that
those methods cannot access the state of the object but only the state and
methods of the class. Listing 7-9 shows how to write a class method.
As you can see, there are various ways to access the get_radius() class
method, but however you choose to access it, the method is always bound
to the class it is attached to. Also, its first argument must be the class itself.
Remember: classes are objects too!
Class methods are principally useful for creating factory methods, which
instantiate objects using a different signature than __init__:
class Pizza(object):
def __init__(self, ingredients):
self.ingredients = ingredients
@classmethod
def from_fridge(cls, fridge):
return cls(fridge.get_cheese() + fridge.get_vegetables())
Abstract Methods
An abstract method is defined in an abstract base class that may not itself pro-
vide any implementation. When a class has an abstract method, it cannot
be instantiated. As a consequence, an abstract class (defined as a class that
110 Chapter 7
has at least one abstract method) must be used as a parent class by another
class. This subclass will be in charge of implementing the abstract method,
making it possible to instantiate the parent class.
We can use abstract base classes to make clear the relationships
between other, connected classes derived from the base class but make the
abstract base class itself impossible to instantiate. By using abstract base
classes, you can ensure the classes derived from the base class implement
particular methods from the base class, or an exception will be raised. The
following example shows the simplest way to write an abstract method in
Python:
class Pizza(object):
@staticmethod
def get_radius():
raise NotImplementedError
With this definition, any class inheriting from Pizza must implement
and override the get_radius() method; otherwise, calling the method raises
the exception shown here. This is handy for making sure that each subclass
of Pizza implements its own way of computing and returning its radius.
This way of implementing abstract methods has a drawback: if you
write a class that inherits from Pizza but forget to implement get_radius(),
the error is raised only if you try to use that method at runtime. Here’s an
example:
>>> Pizza()
<__main__.Pizza object at 0x7fb747353d90>
>>> Pizza().get_radius()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in get_radius
NotImplementedError
import abc
@abc.abstractmethod
def get_radius(self):
"""Method that should do something."""
>>> BasePizza()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class BasePizza with abstract methods
get_radius
import abc
@abc.abstractmethod
def get_ingredients(self):
"""Returns the ingredient list."""
class Calzone(BasePizza):
def get_ingredients(self, with_egg=False):
egg = Egg() if with_egg else None
return self.ingredients + [egg]
Listing 7-10: Using a subclass to extend the signature of the abstract method of its parent
We define the Calzone subclass to inherit from the BasePizza class. We can
define the Calzone subclass’s methods any way we like, as long as they support
112 Chapter 7
the interface we define in BasePizza. This includes implementing the methods
as either class or static methods. The following code defines an abstract
get_ingredients() method in the base class and a static get_ingredients()
method in the DietPizza subclass:
import abc
@abc.abstractmethod
def get_ingredients(self):
"""Returns the ingredient list."""
class DietPizza(BasePizza):
@staticmethod
def get_ingredients():
return None
import abc
ingredients = ['cheese']
@classmethod
@abc.abstractmethod
def get_ingredients(cls):
"""Returns the ingredient list."""
return cls.ingredients
import abc
default_ingredients = ['cheese']
@classmethod
@abc.abstractmethod
def get_ingredients(cls):
"""Returns the default ingredient list."""
return cls.default_ingredients
class DietPizza(BasePizza):
def get_ingredients(self):
return [Egg()] + super(DietPizza, self).get_ingredients()
In this example, every Pizza you make that inherits from BasePizza has
to override the get_ingredients() method, but every Pizza also has access to
the base class’s default mechanism for getting the ingredients list. This
mechanism is especially useful when providing an interface to implement
while also providing base code that might be useful to all inheriting classes.
Note Many of the pros and cons of single and multiple inheritances, composition, or even
duck typing are out of scope for this book, so we won’t cover everything here. If you
are not familiar with these notions, I suggest you read about them to form your own
opinions.
As you should know by now, classes are objects in Python. The construct
used to create a class is a special statement that you should be well familiar
with: class classname(expression of inheritance).
114 Chapter 7
The code in parentheses is a Python expression that returns the list of
class objects to be used as the class’s parents. Ordinarily, you would specify
them directly, but you could also write something like this to specify the list
of parent objects:
Listing 7-13: The super() function is a constructor that instantiates a super object.
>>> super(C)
<super: <class 'C'>, NULL>
Since no instance has been provided as the second argument, the super
object cannot be bound to any instance. Therefore, you cannot use this
unbound object to access class attributes. If you try, you’ll get the follow-
ing errors:
>>> super(C).foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'super' object has no attribute 'foo'
>>> super(C).bar
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'super' object has no attribute 'bar'
>>> super(C).xyz
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'super' object has no attribute 'xyz'
At first glance, it might seem like this unbound kind of super object is use-
less, but actually the way the super class implements the descriptor protocol
__get__ makes unbound super objects useful as class attributes:
The unbound super object’s __get__ method is called using the instance
super(C).__get__(D()) and the attribute name 'foo' as arguments, allowing it
to find and resolve foo.
116 Chapter 7
Note Even if you’ve never heard of the descriptor protocol, it’s likely you’ve used it through
the @property decorator without knowing it. The descriptor protocol is the mecha-
nism in Python that allows an object stored as an attribute to return something other
than itself. This protocol is not covered in this book, but you can find out more about
it in the Python data model documentation.
There are plenty of situations in which using super() can be tricky, such
as when handling different method signatures along the inheritance chain.
Unfortunately, there’s no silver bullet for all occasions. The best precau-
tion is to use tricks such as having all your methods accept their arguments
using *args, **kwargs.
Since Python 3, super() has picked up a bit of magic: it can now be
called from within a method without any arguments. When no argu-
ments are passed to super(), it automatically searches the stack frame
for arguments:
class B(A):
def foo(self):
super().foo()
Summary
Equipped with what you learned in this chapter, you should be unbeatable
on everything that concerns methods definition in Python. Decorators are
essential when it comes to code factorization, and proper use of the built-
in decorators provided by Python can vastly improve the neatness of your
Python code. Abstract classes are especially useful when providing an API
to other developers and services.
Class inheritance is not often fully understood, and having an overview
of the internal machinery of the language is a good way to fully apprehend
how this works. There should be no secrets left on this topic for you now!
def remove_last_item(mylist):
"""Removes the last item from a list."""
mylist.pop(-1) # This modifies mylist
def butlast(mylist):
120 Chapter 8
Testability Testing a functional program is incredibly easy: all you
need is a set of inputs and an expected set of outputs. They are idem-
potent, meaning that calling the same function over and over with the
same arguments will always return the same result.
Generators
A generator is an object that behaves like an iterator, in that it generates and
returns a value on each call of its next() method until a StopIteration is raised.
Generators, first introduced in PEP 255, offer an easy way to create objects
that implement the iterator protocol. While writing generators in a functional
style is not strictly necessary, doing so makes them easier to write and debug
and is a common practice.
To create a generator, just write a regular Python function that contains
a yield statement. Python will detect the use of yield and tag the function
as a generator. When execution reaches the yield statement, the function
returns a value as with a return statement, but with one notable difference:
the interpreter will save a stack reference, and this will be used to resume the
function’s execution when the next() function is called again.
When functions are executed, the chaining of their execution produces
a stack—function calls are said to be stacked on each other. When a func-
tion returns, it’s removed from the stack, and the value it returns is passed
to the calling function. In the case of a generator, the function does not
really return but yields instead. Python therefore saves the state of the func-
tion as a stack reference, resuming the execution of the generator at the
point it saved when the next iteration of the generator is needed.
Creating a Generator
As mentioned, you create a generator by writing a normal function and
including yield in the function’s body. Listing 8-1 creates a generator called
mygenerator() that includes three yields, meaning it will iterate with the next
three calls to next().
Functional Programming 121
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
$ ulimit -v 131072
$ python3
>>> a = list(range(10000000))
This naive method first tries to build the list, but if we run the program
so far:
Uh-oh. Turns out we can’t build a list of 10 million items with only 128MB
of memory!
Let’s try using a generator instead, with the same 128MB restriction:
$ ulimit -v 131072
$ python3
>>> for value in range(10000000):
... if value == 50000:
... print("Found it")
... break
...
Found it
This time, our program executes without issue. When it is iterated over,
the range() class returns a generator that dynamically generates our list of
122 Chapter 8
integers. Better still, since we are only interested in the 50,000th number,
instead of building the full list, the generator only had to generate 50,000
numbers before it stopped.
By generating values on the fly, generators allow you to handle large
data sets with minimal consumption of memory and processing cycles.
Whenever you need to work with a huge number of values, generators can
help you handle them efficiently.
def shorten(string_list):
length = len(string_list[0])
for s in string_list:
length = yield s[:length]
Using yield and send() in this fashion allows Python generators to func-
tion like coroutines seen in Lua and other languages.
Functional Programming 123
PEP 289 introduced generator expressions, making it possible to build
one-line generators using a syntax similar to list comprehension:
In this example, gen is a generator, just as if we had used the yield state-
ment. The yield in this case is implicit.
Inspecting Generators
To determine whether a function is considered a generator, use inspect
.isgeneratorfunction(). In Listing 8-3, we create a simple generator and
inspect it.
def isgeneratorfunction(object):
"""Return true if the object is a user-defined generator function.
124 Chapter 8
The inspect package provides the inspect.getgeneratorstate() function,
which gives the current state of the generator. We’ll use it on mygenerator()
here at different points of execution:
List Comprehensions
List comprehension, or listcomp for short, allows you to define a list’s con-
tents inline with its declaration. To make a list into a listcomp, you must
wrap it in square brackets as usual, but also include an expression that will
generate the items in the list and a for loop to loop through them.
The following example creates a list without using list comprehension:
>>> x = []
>>> for i in (1, 2, 3):
... x.append(i)
...
>>> x
[1, 2, 3]
And this next example uses list comprehension to make the same list
with a single line:
Functional Programming 125
Using a list comprehension presents two advantages: code written using
listcomps is usually shorter and therefore compiles down to fewer opera-
tions for Python to perform. Rather than creating a list and calling append
over and over, Python can just create the list of items and move them into a
new list in a single operation.
You can use multiple for statements together and use if statements to
filter out items. Here we create a list of words and use list comprehension
to capitalize each item, split up items with multiple words into single words,
and delete the extraneous or :
x = [word.capitalize()
for line in ("hello world?", "world!", "or not")
for word in line.split()
if not word.startswith("or")]
>>> x
['Hello', 'World?', 'World!', 'Not']
This code has two for loops: the first iterates over the text lines, while
the second iterates over words in each of those lines. The final if statement
filters out words that start with or to exclude them from the final list.
Using list comprehension rather than for loops is a neat way to define
lists quickly. Since we’re still talking about functional programming, it’s
worth noting that lists built through list comprehension shouldn’t rely on
changing the program’s state: you are not expected to modify any variable
while building the list. This usually makes the lists more concise and easier
to read than lists made without listcomp.
Note that there’s also syntax for building dictionaries or sets in the
same fashion, like so:
126 Chapter 8
Applying Functions to Items with map()
The map() function takes the form map(function, iterable) and applies function
to each item in iterable to return a list in Python 2 or an iterable map object in
Python 3, as shown in Listing 8-5.
i = 0
while i < len(mylist):
Functional Programming 127
print("Item %d: %s" % (i, mylist[i]))
i += 1
you could accomplish the same thing more efficiently with enumerate(), like so:
def all(iterable):
for x in iterable:
if not x:
return False
return True
def any(iterable):
for x in iterable:
if x:
return True
return False
These functions are useful for checking whether any or all of the values
in an iterable satisfy a given condition. For example, the following checks a
list for two conditions:
The difference here is that any() returns True when at least one element
meets the condition, while all() returns True only if every element meets
the condition. The all() function will also return True for an empty iterable,
since none of the elements is False.
128 Chapter 8
Combining Lists with zip()
The zip() function takes the form zip(iter1 [,iter2 [...]]). It takes multi-
ple sequences and combines them into tuples. This is useful when you need
to combine a list of keys and a list of values into a dict. As with the other
functions described here, zip() returns a list in Python 2 and an iterable in
Python 3. Here we map a list of keys to a list of values to create a dictionary:
You might have noticed by now how the return types differ between Python 2
and Python 3. Most of Python’s purely functional built-in functions return a list
rather than an iterable in Python 2, making them less memory efficient than
their Python 3.x equivalents. If you’re planning to write code using these func-
tions, keep in mind that you’ll get the most benefit out of them in Python 3.
If you’re stuck with Python 2, don’t despair: the itertools module from the
Standard Library provides an iterator-based version of many of these functions
(itertools.izip(), itertools.imap(), itertools.ifilter(), and so on).
def first_positive_number(numbers):
for n in numbers:
if n > 0:
return n
Functional Programming 129
We could rewrite the first_positive_number() function in functional style
like this:
# Less efficient
list(filter(lambda x: x > 0, [-1, 0, 1, 2]))[0]
# Efficient
next(filter(lambda x: x > 0, [-1, 0, 1, 2]))
Note that this may raise an IndexError if no items satisfy the condition,
causing list(filter()) to return an empty list.
For simple cases, you can rely on next() to prevent IndexError from
occurring, like so:
>>> a = range(10)
>>> next(x for x in a if x > 3)
4
>>> a = range(10)
>>> next((x for x in a if x > 10), 'default')
'default'
Listing 8-6: Returning a default value when the condition is not met
This will return a default value rather than an error when a condition
cannot be met. Lucky for us, Python provides a package to handle all of this
for us.
130 Chapter 8
>>> from first import first
>>> first([0, False, None, [], (), 42])
42
>>> first([-1, 0, 1, 2])
-1
>>> first([-1, 0, 1, 2], key=lambda x: x > 0)
1
Listing 8-7: Finding the first item in a list that satisfies a condition
You see that the first() function returns the first valid, non-empty item
in a list.
import operator
from first import first
def greater_than_zero(number):
return number > 0
Listing 8-8: Finding the first item to meet the condition, without using lambda()
This code works identically to that in Listing 8-7, returning the first
non-empty value in a list to meet the condition, but it’s a good deal more
cumbersome: if we wanted to get the first number in the sequence that’s
longer than, say, 42 items, we’d need to define an appropriate function via
def rather than defining it inline with our call to first().
But despite its usefulness in helping us avoid situations like this, lambda
still has its problems. The first module contains a key argument that can
be used to provide a function that receives each item as an argument and
returns a Boolean indicating whether it satisfies the condition. However,
we can’t pass a key function, as it would require more than a single line of
code: a lambda statement cannot be written on more than one line. That is a
significant limitation of lambda.
Instead, we would have to go back to the cumbersome pattern of writing
new function definitions for each key we need. Or would we?
The functools package comes to the rescue with its partial() method,
which provides us with a more flexible alternative to lambda. The functools
.partial() method allows us to create a wrapper function with a twist: rather
Functional Programming 131
than changing the behavior of a function, it instead changes the arguments
it receives, like so:
Here we create a new greater_than() function that works just like the
old greater_than_zero() from Listing 8-8 by default, but this version allows
us to specify the value we want to compare our numbers to, whereas before
it was hardcoded. Here, we pass functools.partial() to our function and
the value we want for min u, and we get back a new function that has min set
to 42, just as we want v. In other words, we can write a function and use
functools.partial() to customize the behavior of our new functions to suit
our needs in any given situation.
Even this version can be pared down. All we’re doing in this example is
comparing two numbers, and as it turns out, the operator module has built-
in functions for exactly that:
import operator
from functools import partial
from first import first
132 Chapter 8
are all designed to help you manipulate iterator (that’s why the module is
called iter-tools) and therefore are all purely functional. Here I’ll list a few of
them and give a brief overview of what they do, and I encourage you to look
into them further if they seem of use.
In this case, we could have also written lambda x: x['foo'], but using
operator lets us avoid having to use lambda at all.
Functional Programming 133
Summary
While Python is often advertised as being object oriented, it can be used in
a very functional manner. A lot of its built-in concepts, such as generators
and list comprehension, are functionally oriented and don’t conflict with
an object-oriented approach. They also limit the reliance on a program’s
global state, for your own good.
Using functional programming as a paradigm in Python can help you
make your program more reusable and easier to test and debug, supporting
the Don’t Repeat Yourself (DRY) mantra. In this spirit, the standard Python
modules itertools and operator are good tools to improve the readability of
your functional code.
134 Chapter 8
9
T h e A b s t r a c t S y n ta x T r e e , H y,
a n d L i s p - l i k e A tt r i b u t e s
Listing 9-1: Using the ast module to dump the AST generated by parsing code
The ast.parse() function parses any string that contains Python code and
returns an _ast.Module object. That object is actually the root of the tree: you
can browse it to discover every node making up the tree. To visualize what
the tree looks like, you can use the ast.dump() function, which will return a
string representation of the whole tree.
In Listing 9-1, the code x = 42 is parsed with ast.parse(), and the result is
printed using ast.dump(). This abstract syntax tree can be rendered as shown
in Figure 9-1, which shows the structure of the Python assign command.
Module Num
Assign
Body n 42
Targets Value
Name
ID ctx Store
The AST always starts with a root element, which is usually an _ast.Module
object. This module object contains a list of statements or expressions to eval-
uate in its body attribute and usually represents the content of a file.
As you can probably guess, the ast.Assign object shown in Figure 9-1
represents an assignment, which is mapped to the = sign in the Python syn-
tax. An ast.Assign object has a list of targets and a value to set the targets to.
136 Chapter 9
The list of targets in this case consists of one object, ast.Name, which rep-
resents a variable whose ID is x. The value is a number n with a value (in
this case) 42. The ctx attribute stores a context, either ast.Store or ast.Load,
depending on whether the variable is being used for reading or writing. In
this case, the variable is being assigned a value, so an ast.Store context is used.
We could pass this AST to Python to be compiled and evaluated via the
built-in compile() function. This function takes an AST as argument, the
source filename, and a mode (either 'exec', 'eval', or 'single'). The source
filename can be any name that you want your AST to appear to be from; it
is common to use the string <input> as the source filename if the data does
not come from a stored file, as shown in Listing 9-2.
Listing 9-2: Using the compile() function to compile data that is not from a stored file
The modes stand for execute (exec), evaluate (eval), and single state-
ment (single). The mode should match what has been given to ast.parse(),
whose default is exec.
• The exec mode is the normal Python mode, used when an _ast.Module is
the root of the tree.
• The eval mode is a special mode that expects a single ast.Expression as
the tree.
• Finally, single is another special mode that expects a single statement or
expression. If it gets an expression, sys.displayhook() will be called with
the result, as when code is run in the interactive shell.
The root of the AST is ast.Interactive, and its body attribute is a list of
nodes.
We could build an AST manually using the classes provided in the ast
module. Obviously, this is a very long way to write Python code and not a
method I would recommend! Nonetheless, it’s fun to do and helpful for
learning about the AST. Let’s see what programming with the AST would
look like.
In Listing 9-3, we build the tree one leaf at a time, where each leaf is an
element (whether a value or an instruction) of the program.
The first leaf is a simple string : the ast.Str represents a literal string,
which here contains the hello world! text. The print_name variable con-
tains an ast.Name object, which refers to a variable—in this case, the print
variable that points to the print() function.
The print_call variable contains a function call. It refers to the func-
tion name to call, the regular arguments to pass to the function call, and
the keyword arguments. Which arguments are used depend on the func-
tions being called. In this case, since it’s the print() function, we’ll pass the
string we made and stored in hello_world.
At last, we create an _ast.Module object to contain all this code as a list
of one expression. We can compile _ast.Module objects using the compile()
function , which parses the tree and generates a native code object. These
code objects are compiled Python code and can finally be executed by a
Python virtual machine using eval!
This whole process is exactly what happens when you run Python on a
.py file: once the text tokens are parsed, they are converted into a tree of ast
objects, compiled, and evaluated.
Note The arguments lineno and col_offset represent the line number and column offset,
respectively, of the source code that has been used to generate the AST. It doesn’t make
much sense to set these values in this context since we are not parsing a source file, but
it can be useful to be able to find the position of the code that generated the AST. For
example, Python uses this information when generating backtraces. Indeed, Python
refuses to compile an AST object that doesn’t provide this information, so we pass fake
values to these. You could also use the ast.fix_missing_locations() function to set
the missing values to the ones set on the parent node.
138 Chapter 9
There are also a few smaller categories, such as the ast.operator class,
which defines standard operators such as add (+), div (/), and right shift (>>),
and the ast.cmpop module, which defines comparisons operators.
The simple example here should give you an idea of how to build an
AST from scratch. It’s easy to then imagine how you might leverage this AST
to construct a compiler that would parse strings and generate code, allowing
you to implement your own syntax to Python! This is exactly what led to the
development of the Hy project, which we’ll discuss later in this chapter.
import ast
class ReplaceBinOp(ast.NodeTransformer):
"""Replace operation by addition in binary operation"""
def visit_BinOp(self, node):
return ast.BinOp(left=node.left,
op=ast.Add(),
right=node.right)
tree = ReplaceBinOp().visit(tree)
ast.fix_missing_locations(tree)
print(ast.dump(tree))
eval(compile(tree, '', 'exec'))
print(x)
The first tree object built is an AST that represents the expression
x = 1/3. Once this is compiled and evaluated, the result of printing x at the
end of the function is 0.33333, the expected result of 1/3.
The second tree object is an instance of ReplaceBinOp, which inher-
its from ast.NodeTransformer. It implements its own version of the ast.
NodeTransformer.visit() method and changes any ast.BinOp operation to
an ast.BinOp that executes ast.Add. Concretely, this changes any binary
operator (+, -, /, and so on) to the + operator. When this second tree is
compiled and evaluated , the result is now 4, which is the result of 1 + 3,
because the / in the first object is replaced with +.
Module(body=[Assign(targets=[Name(id='x', ctx=Store())],
value=BinOp(left=Num(n=1), op=Div(), right=Num(n=3)))])
0.3333333333333333
Module(body=[Assign(targets=[Name(id='x', ctx=Store())],
value=BinOp(left=Num(n=1), op=Add(), right=Num(n=3)))])
4
Note If you need to evaluate a string that should return a simple data type, you can use
ast.literal_eval. As a safer alternative to eval, it prevents the input string from
executing any code.
class Bad(object):
# self is not used, the method does not need
# to be bound, it should be declared static
def foo(self, a, b, c):
return a + b - c
class OK(object):
# This is correct
@staticmethod
def foo(a, b, c):
return a + b - c
Though the Bad.foo method works fine, strictly speaking it is more cor-
rect to write it as OK.foo (turn back to Chapter 7 for more detail on why). To
check whether all the methods in a Python file are correctly declared, we
need to do the following:
140 Chapter 9
• Iterate over all the function definitions (ast.FunctionDef) of that class
statement to check whether it is already declared with @staticmethod.
• If the method is not declared static, check whether the first argument
(self) is used somewhere in the method. If self is not used, the method
can be tagged as potentially miswritten.
[entry_points]
flake8.extension =
--snip--
H904 = ast_ext:StaticmethodChecker
H905 = ast_ext:StaticmethodChecker
In Listing 9-6, we also register two flake8 error codes. As you’ll notice
later, we are actually going to add an extra check to our code while we’re
at it!
The next step is to write the plugin.
class StaticmethodChecker(object):
def __init__(self, tree, filename):
self.tree = tree
def run(self):
pass
The default template is easy to understand: it stores the tree locally for
use in the run() method, which will yield the problems that are discovered.
The value that will be yielded must follow the expected PEP 8 signature: a
tuple of the form (lineno, col_offset, error_string, code).
def run(self):
for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
The code in Listing 9-8 is still not checking for anything, but now it
knows how to ignore statements that are not class definitions. The next step
is to set our checker to ignore anything that is not a function definition.
142 Chapter 9
else:
# Function is not static, we do nothing for now
Pass
Note that in Listing 9-10, we use the special for/else form of Python,
where the else is evaluated unless we use break to exit the for loop. At this
point, we’re able to detect whether a method is declared static.
--snip--
# Check that it has a decorator
for decorator in body_item.decorator_list:
if (isinstance(decorator, ast.Name)
and decorator.id == 'staticmethod'):
# It's a static function, it's OK
break
else:
try:
first_arg = body_item.args.args[0]
except IndexError:
yield (
body_item.lineno,
body_item.col_offset,
"H905: method misses first argument",
"H905",
)
# Check next method
Continue
We finally added a check! This try statement in Listing 9-11 grabs the
first argument from the method signature. If the code fails to retrieve
the first argument from the signature because a first argument doesn’t
exist, we already know there’s a problem: you can’t have a bound method
without the self argument. If the plugin detects that case, it raises the H905
error code we set earlier, signaling a method that misses its first argument.
Note PEP 8 codes follow a particular format for error codes (a letter followed by a number),
but there are no rules as to which code to pick. You could come up with any other code
for this error, as long as it’s not already used by PEP 8 or another extension.
Now you know why we registered two error codes in setup.cfg: we had a
good opportunity to kill two birds with one stone.
--snip--
try:
first_arg = body_item.args.args[0]
except IndexError:
yield (
body_item.lineno,
body_item.col_offset,
"H905: method misses first argument",
"H905",
)
# Check next method
continue
for func_stmt in ast.walk(body_item):
# The checking method must differ between Python 2 and Python 3
if six.PY3:
if (isinstance(func_stmt, ast.Name)
and first_arg.arg == func_stmt.id):
# The first argument is used, it's OK
break
else:
if (func_stmt != first_arg
and isinstance(func_stmt, ast.Name)
and func_stmt.id == first_arg.id):
# The first argument is used, it's OK
break
else:
yield (
body_item.lineno,
body_item.col_offset,
"H904: method should be declared static",
"H904",
)
To check whether the self argument is used in the method’s body, the
plugin in Listing 9-12 iterates recursively, using ast.walk on the body and
looking for the use of the variable named self. If the variable isn’t found,
the program finally yields the H904 error code. Otherwise, nothing happens,
and the code is considered sane.
Note As you may have noticed, the code walks over the module AST definition several
times. There might be some degree of optimization to browsing the AST in only one
pass, but I’m not sure it’s worth it, given how the tool is actually used. I’ll leave that
exercise to you, dear reader.
Knowing the Python AST is not strictly necessary for using Python,
but it does give powerful insight into how the language is built and how it
works. It thus gives you a better understanding of how the code you write is
being used under the hood.
144 Chapter 9
A Quick Introduction to Hy
Now that you have a good understanding of how Python AST works, you can
start dreaming of creating a new syntax for Python. You could parse this new
syntax, build an AST out of it, and compile it down to Python code.
This is exactly what Hy does. Hy is a Lisp dialect that parses a Lisp-like
language and converts it to regular Python AST, making it fully compatible
with the Python ecosystem. You could compare it to what Clojure is to Java.
Hy could fill a book by itself, so we will only skim over it. Hy uses the syntax
and some features of the Lisp family of languages: it’s functionally oriented,
provides macros, and is easily extensible.
If you’re not already familiar with Lisp—and you should be—the Hy
syntax will look familiar. Once you install Hy (by running pip install hy),
launching the hy interpreter will give you a standard REPL prompt from
which you can start to interact with the interpreter, as shown in Listing 9-13.
% hy
hy 0.9.10
=> (+ 1 2)
3
For those not familiar with the Lisp syntax, parentheses are used to
construct lists. If a list is unquoted, it is evaluated: the first element must be
a function, and the rest of the items from the list are passed as arguments.
Here the code (+ 1 2) is equivalent to 1 + 2 in Python.
In Hy, most constructs, such as function definitions, are mapped from
Python directly.
(defclass A [object]
[[x 42]
Listing 9-15 defines a class named A, which inherits from object, with a
class attribute x whose value is 42; then a method y returns the x attribute
plus a value passed as argument.
What’s really wonderful is that you can import any Python library directly
into Hy and use it with no penalty. Use the import() function to import a
module, as shown in Listing 9-16, just as you would with regular Python.
(cond
[(> somevar 50)
(print "That variable is too big!")]
[(< somevar 10)
(print "That variable is too small!")]
[true
(print "That variable is jusssst right!")])
146 Chapter 9
Summary
Just like any other programming language, Python source code can be rep-
resented using an abstract tree. You’ll rarely use the AST directly, but when
you understand how it works, it can provide a helpful perspective.
148 Chapter 9
As for downsides, Hy, by virtue of being a Lisp written in
s-expressions, suffers from the stigma of being hard to learn,
read, or maintain. People might be averse to working on projects
using Hy for fear of its complexity.
Hy is the Lisp everyone loves to hate. Python folks may not enjoy
its syntax, and Lispers may avoid it because Hy uses Python objects
directly, meaning the behavior of fundamental objects can sometimes
be surprising to the seasoned Lisper.
Hopefully people will look past its syntax and consider exploring
parts of Python previously untouched.
Data Structures
Most programming problems can be solved in an elegant and simple manner
with the right data structures—and Python provides many data structures to
choose from. Learning to leverage those existing data structures results in
cleaner and more stable solutions than coding custom data structures.
For example, everybody uses dict, but how many times have you seen
code trying to access a dictionary by catching the KeyError exception, as
shown here:
If you use the get() method already provided by the dict class, you can
avoid having to catch an exception or checking the key’s presence in the
first place:
The method dict.get() can also return a default value instead of None;
just call it with a second argument:
Many developers are guilty of using basic Python data structures with-
out being aware of all the methods they provide. This is also true for sets;
methods in set data structures can solve many problems that would other-
wise need to be addressed by writing nested for/if blocks. For example,
developers often use for/if loops to determine whether an item is in a list,
like this:
def has_invalid_fields(fields):
for field in fields:
152 Chapter 10
if field not in ['foo', 'bar']:
return True
return False
The loop iterates over each item in the list and checks that all items are
either foo or bar. But you can write this more efficiently, removing the need
for a loop:
def has_invalid_fields(fields):
return bool(set(fields) - set(['foo', 'bar']))
This changes the code to convert the fields to a set, and it gets the rest
of the set by subtracting the set(['foo', 'bar']). It then converts the set to a
Boolean value, which indicates whether any items that aren’t foo and bar are
left over. By using sets, there is no need to iterate over any list and to check
items one by one. A single operation on two sets, done internally by Python,
is faster.
Python also has more advanced data structures that can greatly reduce
the burden of code maintenance. For example, take a look at Listing 10-1.
species = {}
add_animal_in_family(species, 'cat', 'felidea')
This code is perfectly valid, but how many times will your programs
require a variation of Listing 10-1? Tens? Hundreds?
Python provides the collections.defaultdict structure, which solves the
problem in an elegant way:
import collections
species = collections.defaultdict(set)
add_animal_in_family(species, 'cat', 'felidea')
Each time you try to access a nonexistent item from your dict, the
defaultdict will use the function that was passed as argument to its con-
structor to build a new value, instead of raising a KeyError. In this case, the
set() function is used to build a new set each time we need it.
The collections module offers a few more data structures that you can
use to solve other kinds of problems. For example, imagine that you want
The collections.Counter object works with any iterable that has hashable
items, removing the need to write your own counting functions. It can eas-
ily count the number of letters in a string and return the top n most com-
mon items of an iterable. You might have tried to implement something like
this on your own if you were not aware it was already provided by Python’s
Standard Library.
With the right data structure, the correct methods, and—obviously—
an adequate algorithm, your program should perform well. However, if it is
not performing well enough, the best way to get clues about where it might
be slow and need optimization is to profile your code.
cProfile
Python has included cProfile by default since Python 2.5. To use cProfile,
call it with your program using the syntax python –m cProfile <program>. This
should load and enable the cProfile module, then run the regular program
with instrumentation enabled, as shown in Listing 10-2.
154 Chapter 10
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :0(_getframe)
1 0.000 0.000 0.000 0.000 :0(len)
104 0.000 0.000 0.000 0.000 :0(setattr)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.000 0.000 0.000 0.000 :0(startswith)
2/1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 StringIO.py:30(<module>)
1 0.000 0.000 0.000 0.000 StringIO.py:42(StringIO)
Listing 10-2 shows the output of running a simple script with cProfile.
This tells you the number of times each function in the program was called
and the time spent on its execution. You can also use the -s option to sort
by other fields; for example, -s time would sort the results by internal time.
We can visualize the information generated by cProfile using a great
tool called KCacheGrind. This tool was created to deal with programs
written in C, but luckily we can use it with Python data by converting the
data to a call tree.
The cProfile module has an -o option that allows you to save the profiling
data, and pyprof2calltree can convert data from one format to the other. First,
install the converter with the following:
Then run the converter as shown in Listing 10-3 to both convert the
data (-i option) and run KCacheGrind with the converted data (-k option).
While retrieving information about how your program runs and visual-
izing it works well to get a macroscopic view of your program, you might
need a more microscopic view of some parts of the code to inspect its ele-
ments more closely. In such a case, I find it better to rely on the dis module
to find out what’s going on behind the scenes.
156 Chapter 10
To see dis in action and how it can be useful, we’ll define two functions
that do the same thing—concatenate three letters—and disassemble them
to see how they do their tasks in different ways:
def concat_a_1():
for letter in abc:
abc[0] + letter
def concat_a_2():
a = abc[0]
for letter in abc:
a + letter
>>> dis.dis(concat_a_1)
2 0 SETUP_LOOP 26 (to 29)
3 LOAD_GLOBAL 0 (abc)
6 GET_ITER
>> 7 FOR_ITER 18 (to 28)
10 STORE_FAST 0 (letter)
3 13 LOAD_GLOBAL 0 (abc)
16 LOAD_CONST 1 (0)
19 BINARY_SUBSCR
20 LOAD_FAST 0 (letter)
23 BINARY_ADD
24 POP_TOP
25 JUMP_ABSOLUTE 7
>> 28 POP_BLOCK
>> 29 LOAD_CONST 0 (None)
32 RETURN_VALUE
>>> dis.dis(concat_a_2)
2 0 LOAD_GLOBAL 0 (abc)
3 LOAD_CONST 1 (0)
6 BINARY_SUBSCR
7 STORE_FAST 0 (a)
4 23 LOAD_FAST 0 (a)
26 LOAD_FAST 1 (letter)
29 BINARY_ADD
30 POP_TOP
31 JUMP_ABSOLUTE 17
>> 34 POP_BLOCK
4 9 LOAD_FAST 0 (y)
158 Chapter 10
12 CALL_FUNCTION 0
15 RETURN_VALUE
3 6 LOAD_CLOSURE 0 (a)
9 BUILD_TUPLE 1
12 LOAD_CONST 2 (<code object y at
x100d139b0, file "<stdin>", line 3>)
15 MAKE_CLOSURE 0
18 STORE_FAST 0 (y)
5 21 LOAD_FAST 0 (y)
24 CALL_FUNCTION 0
27 RETURN_VALUE
While you probably won’t need to use it every day, disassembling code is
a handy tool for when you want a closer look at what happens under the hood.
>>> farm
['cow', 'haystack', 'needle', 'pig']
>>> bisect.insort(farm, 'eggs')
>>> farm
['cow', 'eggs', 'haystack', 'needle', 'pig']
>>> bisect.insort(farm, 'turkey')
>>> farm
['cow', 'eggs', 'haystack', 'needle', 'pig', 'turkey']
Using the bisect module, you could also create a special SortedList
class inheriting from list to create a list that is always sorted, as shown in
Listing 10-10:
import bisect
import unittest
class SortedList(list):
def __init__(self, iterable):
super(SortedList, self).__init__(sorted(iterable))
160 Chapter 10
def extend(self, other):
for item in other:
self.insort(item)
@staticmethod
def append(o):
raise RuntimeError("Cannot append to a sorted list")
class TestSortedList(unittest.TestCase):
def setUp(self):
self.mylist = SortedList(
['a', 'c', 'd', 'x', 'f', 'g', 'w']
)
def test_sorted_init(self):
self.assertEqual(sorted(['a', 'c', 'd', 'x', 'f', 'g', 'w']),
self.mylist)
def test_sorted_insort(self):
self.mylist.insort('z')
self.assertEqual(['a', 'c', 'd', 'f', 'g', 'w', 'x', 'z'],
self.mylist)
self.mylist.insort('b')
self.assertEqual(['a', 'b', 'c', 'd', 'f', 'g', 'w', 'x', 'z'],
self.mylist)
def test_index(self):
self.assertEqual(0, self.mylist.index('a'))
self.assertEqual(1, self.mylist.index('c'))
self.assertEqual(5, self.mylist.index('w'))
self.assertEqual(0, self.mylist.index('a', stop=0))
self.assertEqual(0, self.mylist.index('a', stop=2))
self.assertEqual(0, self.mylist.index('a', stop=20))
self.assertRaises(ValueError, self.mylist.index, 'w', stop=3)
self.assertRaises(ValueError, self.mylist.index, 'a', start=3)
self.assertRaises(ValueError, self.mylist.index, 'a', start=333)
def test_extend(self):
self.mylist.extend(['b', 'h', 'j', 'c'])
self.assertEqual(
['a', 'b', 'c', 'c', 'd', 'f', 'g', 'h', 'j', 'w', 'x']
self.mylist)
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
This definitely gets the job done. However, there is a downside to this
approach. Here we’re creating a class that inherits from the object class, so
by using this Point class, you are instantiating full objects and allocating a
lot of memory.
In Python, regular objects store all of their attributes inside a diction-
ary, and this dictionary is itself stored in the __dict__ attribute, as shown in
Listing 10-11.
>>> p = Point(1, 2)
>>> p.__dict__
{'y': 2, 'x': 1}
>>> p.z = 42
>>> p.z
42
>>> p.__dict__
{'y': 2, 'x': 1, 'z': 42}
For Python, the advantage of using a dict is that it allows you to add as
many attributes as you want to an object. The drawback is that using a dic-
tionary to store these attributes is expensive in terms of memory—you need
to store the object, the keys, the value references, and everything else. That
makes it slow to create and slow to manipulate, with a high memory cost.
162 Chapter 10
As an example of this unnecessary memory usage, consider the follow-
ing simple class:
class Foobar(object):
def __init__(self, x):
self.x = x
This creates a simple Point object with a single attribute named x. Let’s
check the memory usage of this class using the memory_profiler, a nice Python
package that allows us to see the memory usage of a program line by line,
and a small script that creates 100,000 objects, as shown in Listing 10-12.
static PyObject *
type_new(PyTypeObject *metatype, PyObject *args, PyObject *kwds)
{
--snip--
/* Check for a __slots__ sequence variable in dict, and count it */
slots = _PyDict_GetItemId(dict, &PyId___slots__);
nslots = 0;
if (slots == NULL) {
if (may_add_dict)
add_dict++;
if (may_add_weak)
add_weak++;
}
As you can see in Listing 10-13, Python converts the content of __slots__
into a tuple and then into a list, which it builds and sorts before converting
the list back into a tuple to use and store in the class. In this way, Python
can retrieve the values quickly, without having to allocate and use an entire
dictionary.
164 Chapter 10
It’s easy enough to declare and use such a class. All you need to do is
to set the __slots__ attribute to a list of the attributes that will be defined in
the class:
class Foobar(object):
__slots__ = ('x',)
We can compare the memory usage of the two approaches using the
memory_profiler Python package, as shown in Listing 10-14.
Listing 10-14 shows that this time, less than 12MB of memory was needed
to create 100,000 objects—or fewer than 120 bytes per object. Thus, by using
the __slots__ attribute of Python classes, we can reduce memory usage, so
when we are creating a large number of simple objects, the __slots__ attribute
is an effective and efficient choice. However, this technique shouldn’t be used
for performing static typing by hardcoding the list of attributes of every class:
doing so wouldn’t be in the spirit of Python programs.
The drawback here is that the list of attributes is now fixed. No new
attribute can be added to the Foobar class at runtime. Due to the fixed
nature of the attribute list, it’s easy enough to imagine classes where the
attributes listed would always have a value and where the fields would
always be sorted in some way.
This is exactly what occurs in the namedtuple class from the collection
module. This namedtuple class allows us to dynamically create a class that
will inherit from the tuple class, thus sharing characteristics such as being
immutable and having a fixed number of entries.
Rather than having to reference them by index, namedtuple provides the
ability to retrieve tuple elements by referencing a named attribute. This
makes the tuple easier to access for humans, as shown in Listing 10-15.
Listing 10-15 shows how you can create a simple class with just one line
of code and then instantiate it. We can’t change any attributes of objects of
this class or add attributes to them, both because the class inherits from
namedtuple and because the __slots__ value is set to an empty tuple, avoiding
the creation of the __dict__. Since a class like this would inherit from tuple,
we can easily convert it to a list.
Listing 10-16 demonstrates the memory usage of the namedtuple class
factory.
At around 13MB for 100,000 objects, using namedtuple is slightly less effi-
cient than using an object with __slots__, but the bonus is that it is compat-
ible with the tuple class. It can therefore be passed to many native Python
functions and libraries that expect an iterable as an argument. A namedtuple
class factory also enjoys the various optimizations that exist for tuples: for
example, tuples with fewer items than PyTuple_MAXSAVESIZE (20 by default)
will use a faster memory allocator in CPython.
The namedtuple class also provides a few extra methods that, even if pre-
fixed by an underscore, are actually intended to be public. The _asdict()
method can convert the namedtuple to a dict instance, the _make() method
allows you to convert an existing iterable object to this class, and _replace()
returns a new instance of the object with some fields replaced.
Named tuples are a great replacement for small objects that consists
of only a few attributes and do not require any custom methods—consider
using them rather than dictionaries, for example. If your data type needs
166 Chapter 10
methods, has a fixed list of attributes, and might be instantiated thousands
of times, then creating a custom class using __slots__ might be a good idea
to save some memory.
Memoization
Memoization is an optimization technique used to speed up function calls
by caching their results. The results of a function can be cached only if the
function is pure, meaning that it has no side effects and does not depend on
any global state. (See Chapter 8 for more on pure functions.)
One trivial function that can be memoized is sin(), shown in Listing 10-17.
Listing 10-18 demonstrates how your cache is being used and how to tell
whether there are optimizations to be made. For example, if the number
of misses is high when the cache is not full, then the cache may be useless
because the arguments passed to the function are never identical. This will
help determine what should or should not be memoized!
168 Chapter 10
Faster Python with PyPy
PyPy is an efficient implementation of the Python language that complies
with standards: you should be able to run any Python program with it.
Indeed, the canonical implementation of Python, CPython—so called
because it’s written in C—can be very slow. The idea behind PyPy was to
write a Python interpreter in Python itself. In time, it evolved to be written
in RPython, which is a restricted subset of the Python language.
RPython places constraints on the Python language such that a vari-
able’s type can be inferred at compile time. The RPython code is translated
into C code, which is compiled to build the interpreter. RPython could of
course be used to implement languages other than Python.
What’s interesting in PyPy, besides the technical challenge, is that it is
now at a stage where it can act as a faster replacement for CPython. PyPy
has a just-in-time (JIT) compiler built-in; in other words, it allows the code to
run faster by combining the speed of compiled code with the flexibility of
interpretation.
How fast? That depends, but for pure algorithmic code, it is much
faster. For more general code, PyPy claims to achieve three times the
speed of CPython most of the time. Unfortunately, PyPy also has some of
the limitations of CPython, including the global interpreter lock (GIL), which
allows only one thread to execute at a time.
Though it’s not strictly an optimization technique, targeting PyPy as
one of your supported Python implementations might be a good idea. To
make PyPy a support implementation, you need to make sure that you
are testing your software under PyPy as you would under CPython. In
Chapter 6, we discussed tox (see “Using virtualenv with tox” on page 92),
which supports the building of virtual environments using PyPy, just as it
does for any version of CPython, so putting PyPy support in place should
be pretty straightforward.
Testing PyPy support right at the beginning of the project will ensure
that there’s not too much work to do at a later stage if you decide that you
want to be able to run your software with PyPy.
Note For the Hy project discussed in Chapter 9, we successfully adopted this strategy from
the beginning. Hy always has supported PyPy and all other CPython versions without
much trouble. On the other hand, OpenStack failed to do so for its projects and, as
a result, is now blocked by various code paths and dependencies that don’t work on
PyPy for various reasons; they weren’t required to be fully tested in the early stages.
PyPy is compatible with Python 2.7 and Python 3.5, and its JIT com-
piler works on 32- and 64-bit, x86, and ARM architectures and under
various operating systems (Linux, Windows, and Mac OS X). PyPy often
lags behind CPython in features, but it regularly catches up. Unless your
project is reliant on the latest CPython features, this lag might not be a
problem.
@profile
def read_random():
with open("/dev/urandom", "rb") as source:
content = source.read(1024 * 10000)
content_to_write = content[1024:]
print("Content length: %d, content to write length %d" %
(len(content), len(content_to_write)))
with open("/dev/null", "wb") as target:
target.write(content_to_write)
if __name__ == '__main__':
read_random()
170 Chapter 10
What’s interesting in Listing 10-20 is that the program’s memory usage is
increased by about 10MB when building the variable content_to_write. In fact,
the slice operator is copying the entirety of content, minus the first KB, into
a new string object, allocating a large chunk of the 10MB.
Performing this kind of operation on large byte arrays is going to be
a disaster since large pieces of memory will be allocated and copied. If
you have experience writing in C code, you know that using the memcpy()
function has a significant cost in terms of both memory usage and general
performance.
But as a C programmer, you’ll also know that strings are arrays of char-
acters and that nothing stops you from looking at only part of an array with-
out copying it. You can do this through the use of basic pointer arithmetic,
assuming that the entire string is in a contiguous memory area.
This is also possible in Python using objects that implement the buffer
protocol. The buffer protocol is defined in PEP 3118, as a C API that needs
to be implemented on various types for them to provide this protocol. The
string class, for example, implements this protocol.
When you implement this protocol on an object, you can then use the
memoryview class constructor to build a new memoryview object that will refer-
ence the original object memory. For example, Listing 10-21 shows how to
use memoryview to access slice of a string without doing any copying:
>>> s = b"abcdefgh"
>>> view = memoryview(s)
>>> view[1]
98 <1>
>>> limited = view[1:3]
>>> limited
<memory at 0x7fca18b8d460>
>>> bytes(view[1:3])
b'bc'
At , you find the ASCII code for the letter b. In Listing 10-21, we
are making use of the fact that the memoryview object’s slice operator itself
returns a memoryview object. That means it does not copy any data but merely
references a particular slice of it, saving the memory that would be used by
a copy. Figure 10-2 illustrates what happens in Listing 10-21.
a b c d e f g h
limited
We can rewrite the program from Listing 10-19, this time referencing
the data we want to write using a memoryview object rather than allocating a
new string.
if __name__ == '__main__':
read_random()
The program in Listing 10-22 uses half the memory of the first version in
Listing 10-19. We can see this by testing it with memory_profiler again, like so:
import socket
s = socket.socket(...)
s.connect(...)
data = b"a" * (1024 * 100000) <1>
172 Chapter 10
while data:
sent = s.send(data)
data = data[sent:] <2>
First, we build a bytes object that contains the letter a more than 100 mil-
lion times . Then we remove the first sent bytes .
Using a mechanism that implemented in Listing 10-23, a program will
copy the data over and over until the socket has sent everything.
We can alter the program in Listing 10-23 to use memoryview to achieve
the same functionality with zero copying, and therefore higher perfor-
mance, as shown in Listing 10-24.
import socket
s = socket.socket(...)
s.connect(...)
data = b"a" * (1024 * 100000) <1>
mv = memoryview(data)
while mv:
sent = s.send(mv)
mv = mv[sent:] <2>
First, we build a bytes object that contains the letter a more than 100 mil-
lion times . Then, we build a new memoryview object pointing to the data that
remains to be sent, rather than copying that data . This program won’t copy
anything, so it won’t use any more memory than the 100MB initially needed
for the data variable.
We’ve seen how memoryview objects can be used to write data efficiently,
and this same method can be used to read data. Most I/O operations in
Python know how to deal with objects implementing the buffer protocol:
they can read from those, and also write to those. In this case, we don’t
need memoryview objects; we can just ask an I/O function to write into our
preallocated object, as shown in Listing 10-25.
>>> ba = bytearray(8)
>>> ba
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00')
>>> with open("/dev/urandom", "rb") as source:
... source.readinto(ba)
...
8
>>> ba
bytearray(b'`m.z\x8d\x0fp\xa1')
>>> ba = bytearray(8)
>>> ba_at_4 = memoryview(ba)[4:]
>>> with open("/dev/urandom", "rb") as source:
... source.readinto(ba_at_4)
...
4
>>> ba
bytearray(b'\x00\x00\x00\x00\x0b\x19\xae\xb2')
Summary
As we’ve seen in this chapter, there are plenty of ways to make Python code
faster. Choosing the right data structure and using the correct methods for
manipulating the data can have a huge impact in terms of CPU and mem-
ory usage. That’s why it’s important to understand what happens in Python
internally.
However, optimization should never be done prematurely, without first
performing a proper profiling. It is too easy to waste time rewriting some
barely used code with a faster variant while missing central pain points.
Don’t miss the big picture.
174 Chapter 10
What’s a good starting strategy for optimizing Python code?
The strategy is the same in Python as in other languages. First, you
need a well-defined use case in order to get a stable and reproducible
benchmark. Without a reliable benchmark, trying different optimiza-
tions may result in wasted time and premature optimization. Useless
optimizations may make the code worse, less readable, or even slower. A
useful optimization must speed the program up by at least 5 percent if
it’s to be worth pursuing.
If a specific part of the code is identified as being “slow,” a bench-
mark should be prepared on this code. A benchmark on a short
function is usually called a micro-benchmark. The speedup should be
at least 20 percent, maybe 25 percent, to justify an optimization on a
micro-benchmark.
It may be interesting to run a benchmark on different computers,
different operating systems, or different compilers. For example, per-
formances of realloc() may vary between Linux and Windows.
What are your recommended tools for profiling or optimizing Python code?
Python 3.3 has a time.perf_counter() function to measure elapsed time
for a benchmark. It has the best resolution available.
A test should be run more than once; three times is a minimum,
and five may be enough. Repeating a test fills disk cache and CPU
caches. I prefer to keep the minimum timing; other developers prefer
the geometric mean.
For micro-benchmarks, the timeit module is easy to use and gives
results quickly, but the results are not reliable using default parameters.
Tests should be repeated manually to get stable results.
Optimizing can take a lot of time, so it’s better to focus on func-
tions that use the most CPU power. To find these functions, Python
has cProfile and profile modules to record the amount of time spent
in each function.
Do you have any Python tricks that could improve performance?
You should reuse the Standard Library as much as possible—it’s well
tested and also usually efficient. Built-in Python types are implemented
in C and have good performance. Use the correct container to get the
best performance; Python provides many different kind of containers:
dict, list, deque, set, and so on.
There are some hacks for optimizing Python, but you should avoid
these because they make the code less readable in exchange for a minor
speedup.
The Zen of Python (PEP 20) says, “There should be one—and pref-
erably only one—obvious way to do it.” In practice, there are different
ways to write Python code, and performances are not the same. Only
trust benchmarks on your use case.
176 Chapter 10
11
Scaling and Architecture
178 Chapter 11
projects such as Jython by their very nature lag behind CPython and so are
not really useful targets; innovation happens in CPython, and the other
implementations are just following in CPython’s footsteps.
So, let’s revisit our two use cases with what we now know and figure out
a better solution:
• When you need to run background tasks, you can use multithreading,
but the easier solution is to build your application around an event
loop. There are a lot of Python modules that provide for this, and the
standard is now asyncio. There are also frameworks, such as Twisted,
built around the same concept. The most advanced frameworks will
give you access to events based on signals, timers, and file descriptor
activity—we’ll talk about this later in the chapter in “Event-Driven
Architecture” on page 181.
• When you need to spread the workload, using multiple processes is the
most efficient method. We’ll look at this technique in the next section.
import random
import threading
results = []
def compute():
results.append(sum(
[random.randint(1, 100) for i in range(1000000)]))
workers = [threading.Thread(target=compute) for x in range(8)]
This has been run on an idle four-core CPU, which means that Python
could potentially have used up to 400 percent of CPU. However, these
results show that it was clearly unable to do that, even with eight threads
running in parallel. Instead, its CPU usage maxed out at 129 percent, which
is just 32 percent of the hardware’s capabilities (129/400).
Now, let’s rewrite this implementation using multiprocessing. For a simple
case like this, switching to multiprocessing is pretty straightforward, as
shown in Listing 11-2.
import multiprocessing
import random
def compute(n):
return sum(
[random.randint(1, 100) for i in range(1000000)])
# Start 8 workers
pool = multiprocessing.Pool(processes=8)
print("Results: %s" % pool.map(compute, range(8)))
180 Chapter 11
0498016, 50537899]
python workermp.py 16.53s user 0.12s system 363% cpu 4.581 total
Event-Driven Architecture
Event-driven programming is characterized by the use of events, such as
user input, to dictate how control flows through a program, and it is a
good solution for organizing program flow. The event-driven program
listens for various events happening on a queue and reacts based on
those incoming events.
Let’s say you want to build an application that listens for a connection
on a socket and then processes the connection it receives. There are basi-
cally three ways to approach the problem:
import select
import socket
server = socket.socket(socket.AF_INET,
socket.SOCK_STREAM)
# Never block on read/write operations
server.setblocking(0)
while True:
# select() returns 3 arrays containing the object (sockets, files...)
Listing 11-3: Event-driven program that listens for and processes connections
182 Chapter 11
Python interfaces, such as libevent, libev, or libuv, also provide very efficient
event loops.
These options all solve the same problem. The downside is that, while
there are a wide variety of choices, most of them are not interoperable.
Many are also callback based, meaning that the program flow is not very
clear when reading the code; you have to jump to a lot of different places
to read through the program.
Another option would be the gevent or greenlet libraries, which avoid
callback use. However, the implementation details include CPython x86–
specific code and dynamic modification of standard functions at runtime,
meaning you wouldn’t want to use and maintain code using these libraries
over the long term.
In 2012, Guido Van Rossum began work on a solution code-named tulip,
documented under PEP 3156 (https://www.python.org/dev/peps/pep-3156). The
goal of this package was to provide a standard event loop interface that would
be compatible with all frameworks and libraries and be interoperable.
The tulip code has since been renamed and merged into Python 3.4 as
the asyncio module, and it is now the de facto standard. Not all libraries are
compatible with asyncio, and most existing bindings need to be rewritten.
As of Python 3.6, asyncio has been so well integrated that it has its own
await and async keywords, making it straightforward to use. Listing 11-4
shows how the aiohttp library, which provides an asynchronous HTTP bind-
ing, can be used with asyncio to run several web page retrievals concurrently.
import aiohttp
import asyncio
loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*coroutines))
Service-Oriented Architecture
Circumventing Python’s scaling shortcomings can seem tricky. However,
Python is very good at implementing service-oriented architecture (SOA), a style
of software design in which different components provide a set of services
through a communication protocol. For example, OpenStack uses SOA
architecture in all of its components. The components use HTTP REST to
communicate with external clients (end users) and an abstracted remote
procedure call (RPC) mechanism that is built on top of the Advanced
Message Queuing Protocol (AMQP).
In your development situations, knowing which communication chan-
nels to use between those blocks is mainly a matter of knowing with whom
you will be communicating.
When exposing a service to the outside world, the preferred channel is
HTTP, especially for stateless designs such as REST-style (REpresentational
State Transfer–style) architectures. These kinds of architectures make it
easier to implement, scale, deploy, and comprehend services.
However, when exposing and using your API internally, HTTP may be
not the best protocol. There are many other communication protocols and
fully describing even one would likely fill an entire book.
In Python, there are plenty of libraries for building RPC systems. Kombu
is interesting because it provides an RPC mechanism on top of a lot of back-
ends, AMQ protocol being the main one. It also supports Redis, MongoDB,
Beanstalk, Amazon SQS, CouchDB, or ZooKeeper.
In the end, you can indirectly gain a huge amount of performance from
using such loosely coupled architecture. If we consider that each module pro-
vides and exposes an API, we can run multiple daemons that can also expose
that API, allowing multiple processes—and therefore CPUs—to handle the
workload. For example, Apache httpd would create a new worker using a new
system process that handles new connections; we could then dispatch a con-
nection to a different worker running on the same node. To do so, we just
need a system for dispatching the work to our various workers, which this API
provides. Each block will be a different Python process, and as we’ve seen
previously, this approach is better than multithreading for spreading out your
workload. You’ll be able to start multiple workers on each node. Even if state-
less blocks are not strictly necessary, you should favor their use anytime you
have the choice.
184 Chapter 11
Interprocess Communication with ZeroMQ
As we’ve just discussed, a messaging bus is always needed when building
distributed systems. Your processes need to communicate with each other in
order to pass messages. ZeroMQ is a socket library that can act as a concurrency
framework. Listing 11-5 implements the same worker seen in Listing 11-1 but
uses ZeroMQ as a way to dispatch work and communicate between processes.
import multiprocessing
import random
import zmq
def compute():
return sum(
[random.randint(1, 100) for i in range(1000000)])
def worker():
context = zmq.Context()
work_receiver = context.socket(zmq.PULL)
work_receiver.connect("tcp://0.0.0.0:5555")
result_sender = context.socket(zmq.PUSH)
result_sender.connect("tcp://0.0.0.0:5556")
poller = zmq.Poller()
poller.register(work_receiver, zmq.POLLIN)
while True:
socks = dict(poller.poll())
if socks.get(work_receiver) == zmq.POLLIN:
obj = work_receiver.recv_pyobj()
result_sender.send_pyobj(obj())
context = zmq.Context()
# Build a channel to send work to be done
work_sender = context.socket(zmq.PUSH)
work_sender.bind("tcp://0.0.0.0:5555")
# Build a channel to receive computed results
result_receiver = context.socket(zmq.PULL)
result_receiver.bind("tcp://0.0.0.0:5556")
# Start 8 workers
processes = []
for x in range(8):
p = multiprocessing.Process(target=worker)
p.start()
processes.append(p)
# Send 8 jobs
for x in range(8):
work_sender.send_pyobj(compute)
# Read 8 results
results = []
for x in range(8):
results.append(result_receiver.recv_pyobj())
Summary
The rule of thumb in Python is to use threads only for I/O-intensive work-
loads and to switch to multiple processes as soon as a CPU-intensive work-
load is on the table. Distributing workloads on a wider scale—such as when
building a distributed system over a network—requires external libraries
and protocols. These are supported by Python, though provided externally.
186 Chapter 11
12
M a n aging R e l at ion a l
Data ba se s
Note This chapter assumes you know basic SQL. Introducing SQL queries and discussing
how tables work is beyond the scope of this book. If you’re new to SQL, I recommend
learning the basics before continuing. Practical SQL by Anthony DeBarros (No
Starch Press, 2018) is a good place to start.
188 Chapter 12
We want to detect any duplicate messages received and exclude them
from the database. To do this, a typical developer might write SQL using an
ORM, as shown in Listing 12-1.
if query.select(Message).filter(Message.id == some_id):
# We already have the message, it's a duplicate, ignore and raise
raise DuplicateMessage(message)
else:
# Insert the message
query.insert(message)
This code works for most cases, but it has some major drawbacks:
There’s a much better way to write this code, but it requires coopera-
tion with the RDBMS server. Rather than checking for the message’s exis-
tence and then inserting it, we can insert it right away and use a try...except
block to catch a duplicate conflict:
try:
# Insert the message
message_table.insert(message)
except UniqueViolationError:
# Duplicate
raise DuplicateMessage(message)
In this case, inserting the message directly into the table works flaw-
lessly if the message is not already present. If it is, the ORM raises an
exception indicating the violation of the uniqueness constraint. This
method achieves the same effect as Listing 12-1 but in a more efficient
fashion and without any race condition. This is a very simple pattern, and
it doesn’t conflict with any ORM in any way. The problem is that develop-
ers tend to treat SQL databases as dumb storage rather than as a tool they
can use to get proper data integrity and consistency; consequently, they
may duplicate the constraints written in SQL in their controller code rather
than in their model.
Database Backends
ORM supports multiple database backends. No ORM library provides a
complete abstraction of all RDBMS features, and simplifying the code to
the most basic RDBMS available will make using any advanced RDBMS
functions impossible without breaking the abstraction layer. Even simple
things that aren’t standardized in SQL, such as handling timestamp opera-
tions, are a pain to deal with when using an ORM. This is even more true if
your code is RDBMS agnostic. It is important to keep this in mind when you
choose your application’s RDBMS.
Isolating ORM libraries (as described in “External Libraries” on
page 22) helps mitigate potential problems. This approach allows you
to easily swap your ORM library for a different one should the need arise
and to optimize your SQL usage by identifying places with inefficient
query usage, which lets you bypass most of the ORM boilerplate.
For example, you can use your ORM in a module of your application,
such as myapp.storage, to easily build in such isolation. This module should
export only functions and methods that allow you to manipulate the data
at a high level of abstraction. The ORM should be used only from that
module. At any point, you will be able to drop in any module providing the
same API to replace myapp.storage.
The most commonly used ORM library in Python (and arguably the
de facto standard) is sqlalchemy. This library supports a huge number of
backends and provides abstraction for most common operations. Schema
upgrades can be handled by third-party packages such as alembic (https://
pypi.python.org/pypi/alembic/).
Some frameworks, such as Django (https://www.djangoproject.com), pro-
vide their own ORM libraries. If you choose to use a framework, it’s smart
to use the built-in library because it will often integrate better with the
framework than an external one.
Warning The Module View Controller (MVC) architecture that most frameworks rely on can be
easily misused. These frameworks implement (or make it easy to implement) ORM in
their models directly, but without abstracting enough of it: any code you have in your
view and controllers that use the model will also be using ORM directly. You need
to avoid this. You should write a data model that includes the ORM library rather
than consists of it. Doing so provides better testability and isolation, and makes
swapping out the ORM with another storage technology much easier.
190 Chapter 12
Writing the Data-Streaming Application
The purpose of the micro-application in Listing 12-2 is to store messages in
a SQL table and provide access to those messages via an HTTP REST API.
Each message consists of a channel number, a source string, and a content
string.
We also want to stream these messages to the client so that it can pro-
cess them in real time. To do this, we’re going to use the LISTEN and NOTIFY
features of PostgreSQL. These features allow us to listen for messages sent
by a function we provide that PostgreSQL will execute:
The function is now plugged in and will be executed upon each success-
ful INSERT performed in the message table.
$ psql
psql (9.3rc1)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type "help" for help.
As soon as the row is inserted, the notification is sent, and we’re able to
receive it through the PostgreSQL client. Now all we have to do is build the
Python application that streams this event, shown in Listing 12-3.
import psycopg2
import psycopg2.extensions
import select
conn.set_isolation_level(
psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
curs = conn.cursor()
curs.execute("LISTEN channel_1;")
while True:
select.select([conn], [], [])
conn.poll()
while conn.notifies:
notify = conn.notifies.pop()
print("Got NOTIFY:", notify.pid, notify.channel,
notify.payload)
192 Chapter 12
The program listens on channel_1, and as soon as it receives a notifica-
tion, prints it to the screen. If we run the program and insert a row in the
message table, we get the following output:
$ python listen.py
Got NOTIFY: 28797 channel_1
{"id":10,"channel":1,"source":"jd","content":"hello world"}
As soon as we insert the row, PostgreSQL runs the trigger and sends a
notification. Our program receives it and prints the notification payload;
here, that’s the row serialized to JSON. We now have the basic ability to
receive data as it is inserted into the database, without doing any extra
requests or work.
import flask
import psycopg2
import psycopg2.extensions
import select
app = flask.Flask(__name__)
def stream_messages(channel):
conn = psycopg2.connect(database='mydatabase', user='mydatabase',
password='mydatabase', host='localhost')
conn.set_isolation_level(
psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
curs = conn.cursor()
curs.execute("LISTEN channel_%d;" % int(channel))
while True:
select.select([conn], [], [])
conn.poll()
while conn.notifies:
notify = conn.notifies.pop()
yield "data: " + notify.payload + "\n\n"
@app.route("/message/<channel>", methods=['GET'])
def get_messages(channel):
return flask.Response(stream_messages(channel),
mimetype='text/event-stream')
if __name__ == "__main__":
app.run()
Note For the sake of simplicity, this example application has been written in a single file. If
this were a real application, I would move the storage-handling implementation into
its own Python module.
$ python listen+http.py
* Running on http://127.0.0.1:5000/
$ curl -v http://127.0.0.1:5000/message/1
* About to connect() to 127.0.0.1 port 5000 (#0)
* Trying 127.0.0.1...
* Adding handle: conn: 0x1d46e90
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x1d46e90) send_pipe: 1, recv_pipe: 0
* Connected to 127.0.0.1 (127.0.0.1) port 5000 (#0)
> GET /message/1 HTTP/1.1
> User-Agent: curl/7.32.0
> Host: 127.0.0.1:5000
> Accept: */*
>
But as soon as we insert some rows in the message table, we’ll start seeing
data coming in through the terminal running curl. In a third terminal, we
insert a message in the database:
194 Chapter 12
mydatabase-> VALUES(1, 'jd', 'it works');
INSERT 0 1
This data is printed to the terminal running curl. This keeps curl con-
nected to the HTTP server while it waits for the next flux of messages. We
created a streaming service without doing any kind of polling here, build-
ing an entirely push-based system where information flows from one point to
another seamlessly.
A naive and arguably more portable implementation of this application
would instead repeatedly loop over a SELECT statement to poll for new data
inserted in the table. This would work with any other storage system that
does not support a publish-subscribe pattern as this one does.
• Concurrency: Access your data for read or write with as many concur-
rent threads of execution as you want—the RDBMS is there to handle
that correctly for you. That’s the main feature you want out of an RDBMS.
• Concurrency semantics: The details about the concurrency behavior
when using an RDBMS are proposed with a high-level specification
in terms of atomicity and isolation, which are maybe the most crucial
parts of ACID (atomicity, consistency, isolation, durability). Atomicity
is the property that between the time you BEGIN a transaction and the
time you’re done with it (either COMMIT or ROLLBACK), no other concurrent
activity on the system is allowed to know what you’re doing—whatever
that is. When using a proper RDBMS, also include the Data Definition
Language (DDL), for example, CREATE TABLE or ALTER TABLE. Isolation is
all about what you’re allowed to notice of the concurrent activity of the
system from within your own transaction. The SQL standard defines
four levels of isolation, as described in the PostgreSQL documentation
(http://www.postgresql.org/docs/9.2/static/transaction-iso.html).
196 Chapter 12
PostgreSQL also implements SECURITY DEFINER stored procedures, allow-
ing you to offer access to sensible data in a very controlled way, much
the same as with using saved user ID (SUID) programs.
The RDBMS offers to access your data using a SQL, which became
the de facto standard in the ’80s and is now driven by a committee.
In the case of PostgreSQL, lots of extensions are being added, with each
and every major release allowing you to access a very rich DSL language.
All the work of query planning and optimization is done for you by the
RDBMS so that you can focus on a declarative query where you describe
only the result you want from the data you have.
And that’s also why you need to pay close attention to the NoSQL
offerings here, as most of those trendy products are in fact not remov-
ing just the SQL from the offering but a whole lot of other foundations
that you’ve been trained to expect.
What advice would you give to developers using RDBMSs as their storage
backends?
My advice is to remember the differences between a storage backend
and an RDBMS. Those are very different services, and if all you need
is a storage backend, maybe consider using something other than an
RDBMS.
Most often, though, what you really need is a full-blown RDBMS. In
that case, the best option you have is PostgreSQL. Go read its documen-
tation (https://www.postgresql.org/docs/); see the list of data types, opera-
tors, functions, features, and extensions it provides. Read some usage
examples on blog posts.
Then consider PostgreSQL a tool you can leverage in your devel-
opment and include it in your application architecture. Parts of the
services you need to implement are best offered at the RDBMS layer,
and PostgreSQL excels at being that trustworthy part of your whole
implementation.
What’s the best way to use or not use an ORM?
The ORM will best work for CRUD applications: create, read, update,
and delete. The read part should be limited to a very simple SELECT state-
ment targeting a single table, as retrieving more columns than neces-
sary has a significant impact on query performances and resources used.
Any column you retrieve from the RDBMS and that you end up not
using is pure waste of precious resources, a first scalability killer. Even
when your ORM is able to fetch only the data you’re asking for, you still
then have to somehow manage the exact list of columns you want in
each situation, without using a simple abstract method that will auto-
matically compute the fields list for you.
The create, update, and delete queries are simple INSERT, UPDATE, and
DELETE statements. Many RDBMSs offer optimizations that are not lever-
aged by ORMs, such as returning data after an INSERT.
• Time to market: When you’re really in a hurry and want to gain market
share as soon as possible, the only way to get there is to release a first
version of your application and idea. If your team is more proficient at
using an ORM than handcrafting SQL queries, then by all means just do
that. You have to realize, though, that as soon as you’re successful with
your application, one of the first scalability problems you will have to
solve is going to be related to your ORM producing really bad queries.
Also, your usage of the ORM will have painted you into a corner and
resulted in bad code design decisions. But if you’re there, you’re success-
ful enough to spend some refactoring money and remove any depen-
dency on the ORM, right?
• CRUD application: This is the real thing, where you are only editing a
single tuple at a time and you don’t really care about performance, like
for the basic admin application interface.
What are the pros of using PostgreSQL over other databases when
working with Python?
Here are my top reasons for choosing PostgreSQL as a developer:
198 Chapter 12
• Data types, functions, operators, arrays, and ranges: PostgreSQL has
a very rich set of data types that come with a host of operators and func-
tions. It’s even possible to denormalize using arrays or JSON data types
and still be able to write advanced queries, including joins, against those.
• The planner and optimizer: It’s worth taking the time to understand
how complex and powerful these are.
• Transactional DDL: It’s possible to ROLLBACK almost any command. Try
it now: just open your psql shell against a database you have and type in
BEGIN; DROP TABLE foo; ROLLBACK;, where you replace foo with the name of
a table that exists in your local instance. Amazing, right?
• PL/Python (and others such as C, SQL, Javascript, or Lua): You can
run your own Python code on the server, right where the data is, so you
don’t have to fetch it over the network just to process it and then send it
back in a query to do the next level of JOIN.
• Specific indexing (GiST, GIN, SP-GiST, partial and functional):
You can create Python functions to process your data from within
PostgreSQL and then index the result of calling that function. When
you issue a query with a WHERE clause calling that function, it’s called
only once with the data from the query; then it’s matched directly with
the contents of the index.
for k, v in mydict.iteritems():
print(k, v)
Using six, you would replace the mydict.iteritems() code with Python 2-
and 3-compliant code like so:
import six
for k, v in six.iteritems(mydict):
print(k, v)
202 Chapter 13
In Python 3, the basic string type is still str, but it shares the properties
of the Python 2 unicode class and can handle advanced encodings. The bytes
type replaces the str type for handling basic character streams.
The six module again provides functions and constants, such as six.u
and six.string_types, to handle the transition. The same compatibility is
provided for integers, with six.integer_types that will handle the long type
that has been removed from Python 3.
conf = ConfigParser()
Listing 13-1: Using six.moves to use ConfigParser() with Python 2 and Python 3
You can also add your own moves via six.add_move to handle code transi-
tions that six doesn’t handle natively.
In the event that the six library doesn’t cover all your use cases, it may
be worth building a compatibility module encapsulating six itself, thereby
ensuring that you will be able to enhance the module to fit future versions
of Python or dispose of (part of) it when you want to stop supporting a par-
ticular version of the language. Also note that six is open source and that
you can contribute to it rather than maintain your own hacks!
(defclass snare-drum ()
())
(defclass cymbal ()
())
(defclass stick ()
())
(defclass brushes ()
())
This defines the classes snare-drum, cymbal, stick, and brushes without any
parent class or attributes. These classes compose a drum kit, and we can
combine them to play sound. For this, we define a play() method that takes
two arguments and returns a sound as a string:
This only defines a generic method that isn’t attached to any class and
so cannot yet be called. At this stage, we’ve only informed the object system
that the method is generic and might be called with two arguments named
instrument and accessory. In Listing 13-2, we’ll implement versions of this
method that simulate playing our snare drum.
Now we’ve defined concrete methods in code. Each method takes two
arguments: instrument, which is an instance of snare-drum or cymbal, and
accessory, which is an instance of stick or brushes.
204 Chapter 13
At this stage, you should see the first major difference between this sys-
tem and the Python (or similar) object systems: the method isn’t tied to any
particular class. The methods are generic, and they can be implemented for
any class.
Let’s try it. We can call our play() method with some objects:
As you can see, which function is called depends on the class of the
arguments—the object system dispatches the function calls to the right func-
tion for us, based on the type of the arguments we pass. If we call play()
with an object whose classes do not have a method defined, an error will be
thrown.
In Listing 13-3, the play() method is called with a cymbal and a stick
instance; however, the play() method has never been defined for those
arguments, so it raises an error.
import functools
@functools.singledispatch
def play(instrument, accessory):
raise NotImplementedError("Cannot play these")
u @play.register(SnareDrum)
def _(instrument, accessory):
if isinstance(accessory, Stick):
return "POC!"
if isinstance(accessory, Brushes):
return "SHHHH!"
raise NotImplementedError("Cannot play these")
@play.register(Cymbal)
def _(instrument, accessory):
if isinstance(accessory, Brushes):
return "FRCCCHHT!"
raise NotImplementedError("Cannot play these")
This listing defines our four classes and a base play() function that raises
NotImplementedError, indicating that by default we don’t know what to do.
We then write a specialized version of the play() function for a specific
instrument, the SnareDrum u. This function checks which accessory type has
been passed and returns the appropriate sound or raises NotImplementedError
again if the accessory isn’t recognized.
If we run the program, it works as follows:
206 Chapter 13
object class, the first defined version of the function is always the one that
is run. Therefore, if our instrument is an instance of a class that we did not
register, this base function will be called.
As we saw in the Lisp version of the code, CLOS provides a multiple dis-
patcher that can dispatch based on the type of any of the arguments defined
in the method prototype, not just the first one. The Python dispatcher
is named singledispatch for a good reason: it only knows how to dispatch
based on the first argument.
In addition, singledispatch offers no way to call the parent function
directly. There is no equivalent of the Python super() function; you’ll have
to use various tricks to bypass this limitation.
While Python is improving its object system and dispatch mechanism,
it still lacks a lot of the more advanced features that something like CLOS
provides out of the box. That makes encountering singledispatch in the wild
pretty rare. It’s still interesting to know it exists, as you may end up imple-
menting such a mechanism yourself at some point.
Context Managers
The with statement introduced in Python 2.6 is likely to remind old-time
Lispers of the various with-* macros that are often used in that language.
Python provides a similar-looking mechanism with the use of objects that
implement the context management protocol.
If you’ve never used the context management protocol, here’s how it
works. The code block contained inside the with statement is surrounded by
two function calls. The object being used in the with statement determines
the two calls. Those objects are said to implement the context management
protocol.
Objects like those returned by open() support this protocol; that’s why
you can write code along these lines:
The object returned by open() has two methods: one called __enter__
and one called __exit__. These methods are called at the start of the with
block and at the end of it, respectively.
A simple implementation of a context object is shown in Listing 13-5.
class MyContext(object):
def __enter__(self):
pass
1. Call method A.
2. Execute some code.
3. Call method B.
The open() function illustrates this pattern well: the constructor that
opens the file and allocates a file descriptor internally is method A. The
close() method that releases the file descriptor corresponds to method B.
Obviously, the close() function is always meant to be called after you instan-
tiate the file object.
It can be tedious to implement this protocol manually, so the contextlib
standard library provides the contextmanager decorator to make implemen-
tation easier. The contextmanager decorator should be used on a generator
function. The __enter__ and __exit__ methods will be dynamically imple-
mented for you based on the code that wraps the yield statement of the
generator.
In Listing 13-6, MyContext is defined as a context manager.
import contextlib
@contextlib.contextmanager
def MyContext():
print("do something first")
yield
print("do something else")
with MyContext():
print("hello world")
The code before the yield statement will be executed before the with
statement body is run; the code after the yield statement will be executed
once the body of the with statement is over. When run, this program out-
puts the following:
do something first
hello world
do something else
There are a couple of things to handle here though. First, it’s possible
to yield something inside our generator that can be used as part of the
with block.
208 Chapter 13
Listing 13-7 shows how to yield a value to the caller. The keyword as is
used to store this value in a variable.
import contextlib
@contextlib.contextmanager
def MyContext():
print("do something first")
yield 42
print("do something else")
Listing 13-7 shows how to yield a value to the caller. The keyword as is
used to store this value in a variable. When executed, the code outputs the
following:
do something first
42
do something else
import contextlib
@contextlib.contextmanager
def MyContext():
print("do something first")
try:
yield 42
finally:
print("do something else")
do something first
about to raise
do something else
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
ValueError: let's try it
As you can see, the error is raised back to the context manager, and the
program resumes and finishes execution because it ignored the exception
using a try...finally block.
In some contexts, it can be useful to use several context managers at
the same time, for example, when opening two files at the same time to
copy their content, as shown in Listing 13-9.
Listing 13-9: Opening two files at the same time to copy content
That being said, since the with statement supports multiple arguments,
it’s actually more efficient to write a version using a single with, as shown in
Listing 13-10.
Listing 13-10: Opening two files at the same time using only one with statement
class Car(object):
def __init__(self, color, speed=0):
self.color = color
self.speed = speed
210 Chapter 13
The process is always the same: you copy the value of the argument
passed to the __init__ function to a few attributes stored in the object.
Sometimes you’ll also have to check the value that is passed, compute a
default, and so on.
Obviously, you also want your object to be represented correctly if
printed, so you’ll have to implement a __repr__ method. There’s a chance
some of your classes are simple enough to be converted to dictionaries for
serialization. Things become even more complicated when talking about
comparison and hashability (the ability to use hash on an object and store it
in a set).
In reality, most Python programmers do none of this, because the
burden of writing all those checks and methods is too heavy, especially
when you’re not always sure you’ll need them. For example, you might find
that __repr__ is useful in your program only that one time you’re trying to
debug or trace it and decide to print objects in the standard output—and
no other times.
The attr library aims for a straightforward solution by providing a
generic boilerplate for all your classes and generating much of the code for
you. You can install attr using pip with the command pip install attr. Get
ready to enjoy!
Once installed, the attr.s decorator is your entry point into the wonder-
ful world of attr. Use it above a class declaration and then use the function
attr.ib() to declare attributes in your classes. Listing 13-12 shows a way to
rewrite Listing 13-11 using attr.
import attr
@attr.s
class Car(object):
color = attr.ib()
speed = attr.ib(default=0)
When declared this way, the class automatically gains a few useful meth-
ods for free, such as __repr__, which is called to represent objects when they
are printed on stdout in the Python interpreter:
>>> Car("blue")
Car(color='blue', speed=0)
This output is cleaner than the default that __repr__ would have printed:
You can also add more validation on your attributes by using the validator
and converter keyword arguments.
import attr
@attr.s
class Car(object):
color = attr.ib(converter=str)
speed = attr.ib(default=0)
@speed.validator
def speed_validator(self, attribute, value):
if value < 0:
raise ValueError("Value cannot be negative")
Listing 13-14 shows how using the frozen parameter changes the behavior
of the Car class: it can be hashed and therefore stored in a set, but objects can-
not be modified anymore.
In summary, attr provides the implementation for a ton of useful meth-
ods, thereby saving you from writing them yourself. I highly recommend
leveraging attr for its efficiency when building your classes and modeling
your software.
212 Chapter 13
Summary
Congratulations! You made it to the end of the book. You’ve just upped
your Python game and have a better idea of how to write efficient and pro-
ductive Python code. I hope you enjoyed reading this book as much as I
enjoyed writing it.
Python is a wonderful language and can be used in many different
fields, and there are many more areas of Python that we did not touch on
in this book. But every book needs an ending, right?
I highly recommend profiting from open source projects by reading the
available source code out there and contributing to it. Having your code
reviewed and discussed by other developers is often a great way to learn.
Happy hacking!
216 Index
N REpresentational State Transfer
(REST), 184
namedtuple class, 165–166
reStructured Text (reST), 34–36
next() function, 121–122
S
O
scaling, 177–186
object relational mapping (ORM),
scenarios, 83
188, 197–198
select() function, 181–182
OpenStack, 1, 13–14, 22, 29, 97
semantic versioning, 9
optimization, 151–174
service-oriented architecture, 184
ordered lists, 159
setup.cfg, 59–61
setup.py, 7, 57–61
P setuptools library, 58–59, 67
singledispatch() function, 205–207
packaging solutions, 74
Single Responsibility Principle
pbr (Python Build Reasonableness), 60
(SRP), 30
PEP (Python Enhancement Proposal)
six module, 201–202
PEP 440, 8–10
sockets, 172–173
PEP 7, 11
sorted() function, 128
PEP 8, 10–12
sorted list, 159–160
pep8, 10
Sphinx, 33–40, 42
pip, 24–26
autodoc, 36–37
plugins, 71
doctest, 38–39
poll() function, 181
SQL, 187–190, 197–198. See also
PostgreSQL, 190–194, 195–196,
PostgreSQL
198–199
SQLAlchemy, 22, 30, 190
profiling, 154
Stinner, Victor, 174–176
psycopg2 library, 192
streaming, 190
pure functions, 120
strings, 202
pyflakes, 12
super() method, 114–117
pylint, 12
sys module, 17
PyPI, 24, 64–67
sys.path variable, 18
pyprof2calltree, 155
PyPy, 169
pytest, 76–81 T
coverage, 88
Tagliamonte, Paul, 147–149
fixtures, 81
taskflow, 14
mark, 80
testing
pattern, 79
policy, 96, 97–98
parallel, 81–82
skipping, 78
scenarios, 83
unit, 75–76
PYTHONPATH, 18
threads, 178
Python versions, 5–6, 30, 201–203
timeit module, 175
Python 2, 6
timestamps, 49–56
Python 3, 6, 13, 23, 27, 30
time zones, 49–50, 52–54
tox, 92–96
R tox-travis, 97
Travis CI, 96
relational database management
system (RDBMS), 187,
195–197
Index 217
U
Unicode, 202
update_wrapper() function, 105
V
versions
API, 41
numbering, 8–10
Python, 5, 95
virtual environments, 90–96
re-creating, 94
setting up, 91–92
tox, 92–93
W
warnings, 43–44
Web Server Gateway Interface
(WSGI), 29
Wheel, 61–63
universal, 63
with, 207
wraps decorator, 106
Y
yield, 121–123
Z
zero copy, 170
ZeroMQ, 185–186
zip() function, 129
218 Index
Serious Python is set in New Baskerville, Futura, Dogma, and The Sans Mono
Condensed.
Updates
Visit https://nostarch.com/seriouspython/ for updates, errata, and other information.
phone: email:
1.800.420.7240 or sales @ nostarch.com
1.415.863.9900 web:
www.nostarch.com
SERIOUS
WRITE LESS.
CODE MORE. COVERS
BUILD BETTER PYTHON 2 AND 3
PROGRAMS.
PY THON
SERIOUS PY THON
Sharpen your Python skills as you dive deep into the • Employ Python for functional programming using
Python programming language with Serious Python. generators, pure functions, and functional functions B L A C K - B E L T A D V I C E O N D E P L O Y M E N T,
Written for developers and experienced programmers, S C A L A B I L I T Y, T E S T I N G , A N D M O R E
• Extend flake8 to work with the abstract syntax tree
Serious Python brings together more than 15 years of
(AST) to introduce more sophisticated automatic
Python experience to teach you how to avoid common
checks
mistakes, write code more efficiently, and build better
programs in less time. You’ll cover a range of advanced • Apply dynamic performance analysis to identify JULIEN DANJOU
topics like multithreading and memoization, get advice bottlenecks in your code
from experts on things like designing APIs and dealing
• Work with relational databases and effectively
with databases, and learn Python internals to give you a
manage and stream data with PostgreSQL
deeper understanding of the language itself.
Take your Python skills from good to great. Learn from
You’ll first learn how to start a project and tackle
the experts and get seriously good at Python with
topics like versioning, coding style, and automated
Serious Python!
checks. Then you’ll look at how to define functions
efficiently, pick the right data structures and libraries, ABOUT THE AUTHOR
build future-proof programs, package your software
Julien Danjou is a principal software engineer at Red
for distribution, and optimize your programs down to
Hat and a contributor to OpenStack, the largest existing
the bytecode. You’ll also learn how to:
open source project written in Python. He has been a
• Create and use effective decorators and methods, free software and open source hacker for the past
including abstract, static, and class methods 15 years.
T H E F I N E ST I N G E E K E N T E RTA I N M E N T ™
DANJOU
w w w.nostarch.com
$34.95 ($45.95 CDN)
PYTHON
PROGRAMMING LANGUAGES/
SHELVE IN: