Inside The Python Virtual Machine (En)
Inside The Python Virtual Machine (En)
Inside The Python Virtual Machine (En)
Obi Ike-Nwosu
This book is for sale at http://leanpub.com/insidethepythonvirtualmachine
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4. Python Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1 PyObject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Under the cover of Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Type Object Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Minting type instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Objects and their attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.6 Method Resolution Order (MRO) . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5. Code Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1 Exploring code objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Code Objects within other code objects . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Code Objects in the VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6. Frames Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.1 Allocating Frame Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Python and CPython are used interchangeably in this text but any mention of Python
refers to CPython which is the version of python implemented in C. Other implementations
include PyPy which is python implemented in a restricted subset of Python, Jython which
is python implemented on the Java Virtual Machine etc.
I like to think of the execution of a python program as split into two or three main phases as listed
below depending on how the interpreter is invoked. These are covered in different measures within
this write-up:
1. Initialization : This involves the set up of the various data structures needed by the python
process. This will probably only counts when a program is being executed non-interactively
through the interpreter shell.
2. Compiling : This involves activities such as parsing source code to build syntax trees, creation
of abstract syntax trees, building of symbol tables and generation of code objects.
3. Interpreting : This involves the actual execution of generated code objects within some
context.
The process of generating parse trees and abstract syntax trees from source code is language agnostic
so the same methods that apply to other languages also apply to Python; as a result, not much
is on this subject is covered here. On the other hand, the process of building symbol tables and
code objects from the Abstract Syntax tree is the more interesting part of the compilation phase
which is handled in a more or less python specific way and attention is paid to it. The interpreting
of compiled code objects and all the data structures that are used in the process is also covered.
Topics that will be touched upon include but are not limited to the process of building symbol tables
1
http://tpq.io/
2
http://pandas.pydata.org/
Introduction 2
and generating code objects, python objects, frame objects, code objects, function objects, python
opcodes, the interpreter loop, generators and user defined classes.
This material is aimed at anybody that is interested in gaining some insight into how the
CPython virtual machine functions. It is assumed that the user is already familiar with python and
understands the fundamentals of the language. As part of this expose on the virtual machine, we go
through a considerable amount of C code so a user that has a rudimentary understanding of C will
find it easier to follow. After all said and done, all that is needed to get through this material is a
healthy desire to want to learn about the CPython virtual machine.
This work is an expanded version of personal notes taken while investigating the inner working of
the python interpreter. There is substantial amount of wisdoms in videos available in Pycon videos3 ,
school lectures4 and blog write-ups5 . This work will not be complete without acknowledging these
fantastic sources that have been leveraged in the production of this work.
At the end of this book, a user should be able to understand the intricacies of how the Python
interpreter executes a program. This includes the various steps involved in executing the program
and the various data structures that are crucial to the execution of such program. We start off with
a gentle bird’s eye view of what happens when a trivial program is executed by passing the module
name to the interpreter at the commandline. The CPython executable can be installed from source
by following the instructions at the Python Developer’s Guide6 .
3
https://www.youtube.com/watch?v=XGF3Qu4dUqk
4
http://pgbovine.net/cpython-internals.htm/
5
https://tech.blog.aknin.name/2010/04/02/pythons-innards-introduction/
6
https://docs.python.org/devguide/index.html#
2. The View From 30,000ft
This chapter provides a high level expose on how the interpreter goes about executing a python
program. In subsequent chapters, we zoom in on the various pieces of puzzle and provide a more
detailed description of such pieces. Regardless of the complexity of a python program, this process is
the same. The excellent explanation of this process provided by Yaniv Aknin in his Python Internal
series1 provides some of the basis and motivation for this discussion.
Given a python module, test.py, this module can be executed at the command line by passing it
as an argument to the python interpreter program as such $python test.py. This is just one of the
ways of the invoking the python executable - we could start the interactive interpreter, execute a
string as code etc but these other methods of execution are not of interest to us. When the module
is passed as an argument to the executable on the command line, figure 2.1 best captures the flow
of various activities that are involved in the actual execution of the supplied module.
The python executable is a C program just like any other C program such as the linux kernel or
a simple hello world program in C so pretty much the same process happens when the python
executable is invoked. Take a moment to grasp this, the python executable is just another program
1
https://tech.blog.aknin.name/2010/04/02/pythons-innards-introduction/
The View From 30,000ft 4
that runs your own program. The same argument can be made for the relationship between C and
assembly or llvm. The standard process initialization which depends on the platform the executable
is running on starts once the python executable is invoked with module name as argument,
This writeup assumes a unix based operating system so some speicifics may differ when a
windows operating system is being used.
The C runtime performs all its initialisation magic - libraries are loaded, environment variables are
checked or set then the python executable’s main method is run just like any other C program.
The python executable’s main program is located in the ./Programs/python.c file and it handles
some initialization such as making copies of program command line arguments that were passed
to the module. The main function then calls the Py_Main function located in the ./Modules/main.c
which handles the interpreter initialization process - parsing commandline arguments and setting
program flags2 , reading environment variables, running hooks, carrying out hash randomization etc.
As part of the initialization process, Py_Initialize from pylifecycle.c is called; this handles the
initialization of the interpreter and thread state data structures - two very important data structures.
A look at the data structures definitions for the interpreter and thread states provides some context
into the functions of these data structures. The interpreter and thread states are just structures
with pointers to fields that hold information that is required for the execution of a program. The
interpreter state typedef (just assume that this is C lingo for type definition though this is not entirely
true) is provided in listing 2.1.
18 PyObject *builtins_copy;
19 } PyInterpreterState;
Anyone who has used the Python programming language long enough may recognize a few of the
fields mentioned in this structure (sysdict, builtins, codec)*.
1. The *next field is a reference to another interpreter instance as multiple python interpreters
can exist within the same process.
2. The *tstate_head field points to the main thread of execution - in the event that the python
program is multithreaded then the interpreter is shared by all threads created by the program
- the structure of a thread state is discussed shortly.
3. The modules, modules_by_index, sysdict, builtins and importlib are self explanatory -
they are all defined as instances of PyObject which is the root type for all python objects in
the virtual machine world. Python objects are covered in more detail in the chapters that will
follow.
4. The codec* related fields hold information that help with the location and loading of
encodings. These are very important for decoding bytes.
The execution of a program must occur within a thread. The thread state structure contains all the
information that is needed by a thread to execute python some code object - a part of the thread
data structure is shown in listing 2.2.
19 PyObject *curexc_type;
20 PyObject *curexc_value;
21 PyObject *curexc_traceback;
22
23 PyObject *exc_type;
24 PyObject *exc_value;
25 PyObject *exc_traceback;
26
27 PyObject *dict; /* Stores per-thread state */
28 int gilstate_counter;
29
30 ...
31 } PyThreadState;
The interpreter and the thread state data structures are discussed in more details in subsequent
chapters. The initialization process also sets up the import mechanisms as well as rudimentary stdio.
Once all the initialization is complete, the Py_Main function invokes the run_file function also
located in the main.c module. The following series of function calls: PyRun_AnyFileExFlags ->
PyRun_SimpleFileExFlags->PyRun_FileExFlags->PyParser_ASTFromFileObject are made to the
PyParser_ASTFromFileObject function. The PyRun_SimpleFileExFlags function call creates the
__main__ namespace in which the file contents will be executed. It also checks if a pyc version
of the file exists - the pyc file is just a file containing the compiled version of the file being
executed. In the case that the file has a pyc version then an attempt will be made to read it in
as binary and then run it. In this case, there is no pyc file so the PyRun_FileExFlags is called and so
on. The PyParser_ASTFromFileObject function calls the PyParser_ParseFileObject which reads
the module content and builds a parse tree from it. The parse tree created is then passed to the
PyParser_ASTFromNodeObject which then goes ahead to create an abstract syntax tree from the
parse tree.
If you have been following the actual C source code by now you must have run into the
Py_INCREF and Py_DECREF by now. These are memory management functions that will be
discussed later on in more details. CPython manages the object life cycle using reference
counting; whenever a new reference to an object is made the reference is increased with
the Py_INCREF while whenever a reference goes out of scope the reference is reduced with
the Py_DECREF functions.
The AST generated is then passed to the run_mod function. This function invokes the PyAST_-
CompileObject function that creates code objects from the AST. Do note that the bytecode
generated during the call to PyAST_CompileObject is passed through a simple peephole optimizer
that carries out low hanging optimization of the generated bytecode before the code objects
are created. The run_mod function then invokes PyEval_EvalCode from the ceval.c file on the
The View From 30,000ft 7
The ./Include/opcodes.h header file contains a full listing of all the instruction/opcodes for the
python virtual machine. The opcodes are pretty straight forward conceptually. Take our example
from listing 2.3 that has a set of four instructions - the LOAD_FAST loads the value of the its
argument (x in this case) onto an evaluation (value) stack. The python virtual machine is a stack
based virtual machine so this means that values for evaluations by an opcode are gotten from a
stack and results of an evaluation are placed back on the stack for further use by other opcodes. The
BINARY_MULTIPLY opcode then pops two items from the value stack, performs binary multiplication
on both values and places the result of the binary multiplication back on the value stack. The RETURN
VALUE instruction pops a value from the stack, sets the return value to object to this value and breaks
out of the interpreter loop. From the disassembly in listing 2.3, it is pretty clear that this rather
simplistic
explanation of the operation of the interpreter loop leaves out a number of details that will be
discussed in subsequent chapters. A few of these outstanding questions may include.
The View From 30,000ft 8
1. Where are the values such as that loaded by the LOAD_FAST instruction gotten
from ?
2. Where do arguments that are used as part of instructions come from ?
3. How are nested function and method calls managed ? 4 How does the interpreter
loop handle exceptions ?
After all the instructions have been the executed, the Py_Main function continues its execution
but this time around it starts the clean up process. Just as Py_Initialize was called to perform
initialization during the interpreter startup, Py_FinalizeEx is invoked to do some clean-up work;
this clean-up process involves waiting for threads to exit, calling any exit hooks and also freeing up
any memory allocated by the interpreter that is still in use.
The above description is a high-level description of the processes the python executable goes through
to execute a python program. As noted previously, alot of details are stil left to be answered and in
the chapters that will follow, we will dig into each of the stages that have been covered and try
to provide details on each of these stages. We get into action starting with a description of the
compilation process in the next chapter.
3. Compiling Python Source Code
Although python is not popularly regarded as a compiled language, it actually is one. During
compilation, some python source code is transformed into bytecode that is executable by the
virtual machine. This compilation process in python is however a straightforward process that does
not involve lots of complicated steps. The compilation process of a python program involves the
following steps in the given order.
Parsing source code into a parse tree and transforming that parse tree into an AST is a standard
process and python does not introduce any complicated nuances so the focus of this chapter is
on the transformation of an AST into a control flow graph and the emission of code object from
the control flow graph. For anyone interested in parse tree and AST generation, the dragon book1
provides a much more indepth tour de force of both topics.
1
https://www.amazon.co.uk/Compilers-Principles-Techniques-Alfred-Aho/dp/0201100886
2
https://en.wikipedia.org/wiki/LL_parser
Compiling Python Source Code 10
When executing a module passed to the interpreter on the command line, a call to the PyParser_-
ParseFileObject function initiates the parsing of the module. This function calls the tokenization
function, PyTokenizer_FromFile, passing the module file name as argument. The tokenization
function breaks up the content of the module into legal python tokens or throws an exception when
an illegal value is found.
Compiling Python Source Code 11
1. identifiers : These are names that defined by a programmer. They include function names,
variable names, class names etc. These must conform to the rules of identifiers as specified in
the python documentation.
2. operators: these are special symboles such as +, * that operate on data values and produce
results.
3. delimiters: this group of symbols serve to group expressions, provide punctuations and
assignment. Examples in this category include (, ), {,}, =, *= etc.
4. literals: these are symbols that provide a constant value for some type. We have the string and
byte literals such as "Fred", b"Fred" and numeric literals which include integer literals such
as 2, floating point literal such as 1e100 and imaginary literals such as 10j.
5. comments: these are string literals that start with the hash symbol. Comment tokens always
end at the end of the physical line.
6. NEWLINE: this is a special token that denotes the end of a logical line.
7. INDENT and DEDENT: These token are used to represent indentation levels which group
compound statements.
A group of tokens delineated by the NEWLINE token make up a logical line hence we could say
that a python program is made up of a sequence of logical lines with each logical line delineated
by the NEWLINE token. These logical lines maps to python statements. Each of these logical lines
are made up of a number of physical lines that are each terminated by an end-of-line sequence. In
python, most times logical lines map to physical line so we have logical line delimited by end-of-
line characters. Compound statements can span multiple physical lines such as shown in figure 3.0.
Logical lines can be joined together implicitly when expression are in parentheses, square brackets or
curly braces or explicitly by the use of the backslash character. Indentation also plays a central role
in grouping python statements. One of lines in the python grammar is thus suite: simple_stmt
| NEWLINE INDENT stmt+ DEDENT so one of the major task of the python tokenizer is to generate
indent and dedent tokens that go into the parse tree. The tokenizer uses a stack to keep track of
indents and uses the algorithm in listing 3.1 to generate INDENT and DEDENT tokens.
Compiling Python Source Code 12
Listing 3.1: Python indentation algorithm for generting INDENT and DEDENT tokens
The PyTokenizer_FromFile function in the Parser/parsetok.c module scans the python source file
from left to right and top to bottom tokenizing the content of the file. Whitespaces characters other
than terminators serve to delimit tokens but are not compulsory. Where there is some ambiguity
such as in 2+2, a token comprises the the longest possible string that forms a legal token reading
from right to left; in this example the tokens are the literal 2, the operator + and the literal 2.
The tokens generated from the tokenizer are passed to the parser which attempts to build a parse
tree according to the python grammar of which a subset is specified in listing 3.0. When the parser
encounters a token that violates the grammar, a SyntaxError exception is thrown. The output from
the parser is a parse tree. The parser3 python module provides limited access to the parse tree of a
block of python code and it is used in listing 3.2 to get a concrete demonstration of parse trees.
Listing 3.2: Using the parser module to obtain the parse tree of python code
10 [294,
11 [263,
12 [1, 'def'],
13 [1, 'hello_world'],
14 [264, [7, '('], [8, ')']],
15 [11, ':'],
16 [303,
17 [4, ''],
18 [5, ''],
19 [269,
20 [270,
21 [271,
22 [277,
23 [280,
24 [1, 'return'],
25 [330,
26 [304,
27 [308,
28 [309,
29 [310,
30 [311,
31 [314,
32 [315,
33 [316,
34 [317,
35 [318,
36 [319,
37 [320,
38 [321,
39 [322, [323, [3, '"hello world"']]]]]]]]]]]]]]]]]\
40 ]]],
41 [4, '']]],
42 [6, '']]]]],
43 [4, ''],
44 [0, '']]
45 >>>
The parser.suite(source) call in listing 3.2 above returns a parse tree (ST) object, a python
intermediate representation for a parse tree, from the supplied source code so long as the source
code is syntactically correct. The call to parser.st2list returns the actual parse tree represented
in form of a python list. The first items in the lists, the integer, identifies the production rule in the
python grammar.
Compiling Python Source Code 14
Figure 3.0: A parse tree for listing 3.2 (function that returns the ‘hello world’ string)
Figure 3.0 is a tree diagram showing the same parse tree from listing 3.2 with some tokens stripped
away and one can see the part of the grammar each of the integer value represents. These production
rules are all specified in the Include/token.h (terminals) and Include/graminit.h (terminals)
header files.
In the CPython virtual machine, a tree data structure is used to represent the parse tree. Each
production rule is a node on the tree data structure. This node data structure is shown in listing
3.3 from the Include/node.h.
Listing 3.3: The node data structure used in the python virtual machine
As the parse tree is walked, the nodes can be queried for their type, their children if any, the line
Compiling Python Source Code 15
number that led to the creation of the given node and so on. The macros for interacting with parse
tree nodes are also defined in the Include/node.h file.
Listing 3.4: Using the ast module to manipulate the AST of python source code
The definitions of the AST nodes for the various Python are found in the file Parser/Python.asdl
file. Most definitions in the AST correspond to a particular source construct such as an if statement
or an attribute lookup. The ast module bundled with the python interpreter provides us with the
ability to manipulate a python AST. Tools such as codegen4 can take an AST representation in
python and output the corresponding python source code. In the CPython implementation, the AST
nodes are represented by C structures as defined in the Include/Python-ast.h. These structures are
actually generated by python code; the Parser/asdl_c.py module generates this file from the AST
asdl definition. For example a cross-section of the definition of a statement node is shown in listing
3.5.
4
https://pypi.python.org/pypi/codegen/1.0
Compiling Python Source Code 16
1 struct _stmt {
2 enum _stmt_kind kind;
3 union {
4 struct {
5 identifier name;
6 arguments_ty args;
7 asdl_seq *body;
8 asdl_seq *decorator_list;
9 expr_ty returns;
10 } FunctionDef;
11
12 struct {
13 identifier name;
14 arguments_ty args;
15 asdl_seq *body;
16 asdl_seq *decorator_list;
17 expr_ty returns;
18 } AsyncFunctionDef;
19
20 struct {
21 identifier name;
22 asdl_seq *bases;
23 asdl_seq *keywords;
24 asdl_seq *body;
25 asdl_seq *decorator_list;
26 } ClassDef;
27 ...
28 }v;
29 int lineno;
30 int col_offset
31 }
The union type in the listing 3.5 is a C type that is used to represent a type that can take one any
of the types listed in the union. The PyAST_FromNode function in the Python/ast.c module handles
the generation of the AST from a given parse tree. With the AST generated, it is now time for the
generation of bytecode from the AST.
Compiling Python Source Code 17
>>> x = 5
In the above, example, x is a name that references the object, 5. The process of assigning a reference
to 5 to x is called binding. A binding causes a name to be associated with an object in the innermost
scope of the currently executing program. Bindings may occur during a number of instances such
as during variable assignment or function or method call when the supplied parameter is bound to
the argument. It is important to note that names are just symbols and they have no type associated
with them; names are just references to objects that actually have types.
Code Blocks
Code blocks are central to python program and understanding them for what they are is paramount
to understanding the internals of the python virtual machine. A code block is a piece of program
code that is executed as a single unit in python. Modules, functions and classes are all examples
of code blocks. Commands typed in interactively at the REPL, script commands run with the -c
option are also code blocks. A code block has a number of name-spaces associated with it. For
example, a module code block has access to the global name-space while a function code block
has access to the local as well as the global name-spaces.
Namespaces
A namespace as the name implies is a context in which a given set of names is bound to objects.
Namespaces are implemented as dictionary mappings in python. The builtin namespace is an
example of a name-space that contains all the built-in functions and this can be accessed by entering
__builtins__.__dict__ at the terminal (the result is of a considerable amount). The interpreter
has access to multiple namespaces including the global name-space, the builtin namespace and the
local namespace. These namespaces are created at different times and have different lifetimes. For
example, a new local namespace is created at the invocation of a function and forgotten when the
function exits or returns. The global namespace is created at the start of the execution of a module
and all names defined in this namespace are available module wide while the built-in namespace
comes into existence when the interpreter is invoked and contains all the builtin names. These
three name-spaces are the main namespaces available to the interpreter.
Scopes
A scope is an area of a program in which a set of name bindings (namespaces) is visible and directly
accessible without using any dot notation. At runtime, the following scopes may be available.
When a name is used in python, the interpreter searches the namespaces of the scopes in ascending
order as listed above and if the name is not found in any of the namespaces, an exception is raised.
Python supports static scoping also known as lexical scoping; this means that the visibility of a set
of name bindings can be inferred by only inspecting the program text.
Note
Python has a quirky scoping rule that prevents a reference to an object in the global scope from
being modified in a local scope; such an attempt will throw an UnboundLocalError exception. In
order to modify an object from the global scope within a local scope, the global keyword has to
be used with the object name before modification is attempted. The following example illustrates
this.
>>> a = 1
>>> def inc_a(): a += 2
...
>>> inc_a()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in inc_a
UnboundLocalError: local variable 'a' referenced before assignment
In order to modify the object from the global scope, the global statement is used as shown in the
following snippet.
Listing A3.1: Using the global keyword to modify a global variable from a function
>>> a = 1
>>> def inc_a():
... global a
... a += 1
...
>>> inc_a()
>>> a
2
Python also has the nonlocal keyword that is used when there is a need to modify a variable bound
in an outer non-global scope from an inner scope. This proves very handy when working with
nested functions (also referred to as closures). A very trivial illustration of the nonlocal keyword
Compiling Python Source Code 20
in action is shown in the following snippet that defines a simple counter object that counts in
ascending order.
The series of function calls run_mod -> PyAST_CompileObject -> PySymtable_BuildObject trig-
gers the process of building a symbol table. Two of the arguments to the PySymtable_BuildObject
function are the AST previously generated and the file name of the module. The algorithm for
building the symbol table is split into two parts. During the first part of the process, each node of
the AST that is passed as argument to the PySymtable_BuildObject is visited in order to build up
a collection of symbols used within the AST. A very simple description of this process is given in
listing 3.6 and the terms used in this will be more obvious when we discuss the data structures used
in building a symbol table.
Compiling Python Source Code 21
After the first pass of algorithm on the AST, the symbol table entries contain all names that have
been used within the module but they do not have contextual information about such names. For
example, the interpreter cannot tell if a given variable is a global, local or free variable. A call to
the symtable_analyze function in the Parser/symtable.c initiates the second phase of the symbol
table generation. This phase of the algorithm assigns scopes (local, global or free) to the symbols
gathered from the first pass. The comments in the Parser/symtable.c are quite informative and
are paraphrased below to provide an insight into the second phase of the symbol table construction
process.
The symbol table requires two passes to determine the scope of each name. The first
pass collects raw facts from the AST via the symtable_visit_* functions while the second
pass analyzes these facts during a pass over the PySTEntryObjects created during pass
1.
When a function is entered during the second pass, the parent passes the set of all
name bindings visible to its children. These bindings are used to determine if non-local
variables are free or implicit globals. Names which are explicitly declared nonlocal must
exist in this set of visible names - if they do not, a syntax error is raised. After doing the
local analysis, it analyzes each of its child blocks using an updated set of name bindings.
There are also two kinds of global variables, implicit and explicit. An explicit global is
declared with the global statement. An implicit global is a free variable for which the
Compiling Python Source Code 22
compiler has found no binding in an enclosing function scope. The implicit global is
either a global or a builtin.
Python’s module and class blocks use the xxx_NAME opcodes to handle these names to
implement slightly odd semantics. In such a block, the name is treated as global until
it is assigned to; then it is treated as a local.
The children update the free variable set. If a local variable is added to the free variable
set by the child, the variable is marked as a cell. The function object being defined
must provide runtime storage for the variable that may outlive the function’s frame.
Cell variables are removed from the free set before the analyze function returns to its
parent.
Although the comments try to explain the process in clear language, there are some confusing aspects
such the parent passes the set of all name bindings visible to its children - what parent and what
children are being referred to? To get an understanding of such terminology, we have to look at the
data structures that are used to in the process of creating a symbol table.
The symbol table data structure is shown in listing 3.7. One can think of this as a table that is
composed of entries that hold information on the names used in different code blocks of a given
module.
1 struct symtable {
2 PyObject *st_filename; /* name of file being compiled */
3 struct _symtable_entry *st_cur; /* current symbol table entry */
4 struct _symtable_entry *st_top; /* symbol table entry for module */
5 PyObject *st_blocks; /* dict: map AST node addresses
6 * to symbol table entries */
7 PyObject *st_stack; /*list: stack of namespace info */
8 PyObject *st_global; /*borrowed ref to
9 st_top->ste_symbols*/
10 int st_nblocks; /* number of blocks used. kept for
11 consistency with the corresponding
12 compiler structure */
Compiling Python Source Code 23
Once again, going through the _symtable_entry data structure in Include/symtable.h is very
helpful in getting an understanding for what this data structure does. This data structure is shown
in listing 3.8.
Compiling Python Source Code 24
The comments in the source code explain what each field does. The ste_symbols field is a mapping
that contains the symbols/names that are encountered during the analysis of the code block; the
flags which the symbols map to are numeric values that provide information on the context in
which symbol/name is being used. For example, a symbol may be a function argument or a global
statement definition. Some examples of these flags as defined in the Include/symtable.h module
are shown in listing 3.9.
Compiling Python Source Code 25
Returning to our discussion on symbol tables, assuming a module that contains the code shown in
listing 3.10 is being compiled. After the symbol table is built, there are three symbol table entries.
def make_counter():
count = 0
def counter():
nonlocal count
count += 1
return count
return counter
The first entry is that of the enclosing module and it will have make_counter defined with a scope of
local. The next symbol table entry will be that of function make_counter and this will have the count
and counter names marked as local. The final symbol table entry will be that of the nested counter
function. This will have the count variable marked as free. One thing to note is that although
make_counter is defined as local in the symbol table entry for the module block, it is regarded as
globally defined within the module code block as the *st_global points to the the *st_top symbols
which in this case is that of the enclosing module.
code paths that can be taken during the execution of a program. In the second step, the generated
control flow graph is flattened using a post-order depth first search transversal. After the graph is
flattened, the jump offsets are then calculated and used as instruction argument for byte code jump
instructions. The code object is emitted from the this set of instructions. To get a good understanding
of this process, consider the the fizzbuzz function in listing 3.11.
1 def fizzbuzz(n):
2 if n % 3 == 0 and n % 5 == 0:
3 return 'FizzBuzz'
4 elif n % 3 == 0:
5 return 'Fizz'
6 elif n % 5 == 0:
7 return 'Buzz'
8 else:
9 return str(n)
The AST for the function for this function is shown in figure 3.2.
This AST from figure 3.2 when compiled into a CFG gives a graph that is similar to that shown in
figure 3.3; empty blocks have been omitted from the figure. An inspection of this graph provides
some intuition behind the basic blocks. The basic blocks have a single entry point but can have
multiple exits. These blocks are described in more detail next.
Figure 3.3: Control flow graph for the fizzbuzz function from listing 3.11. The straight line represent normal
straight line execution of code while the curved lines represent jumps.
In the following descriptions, only the actual instructions have been included; some instructions
Compiling Python Source Code 28
need an argument but these have not been included to enable us focus on the main topic at hand.
1. Block 1 - This block contains instructions that map to the BoolOp node of the AST in figure
3.2. The instructions in this block implement the operation n%3==0 and n%5==0 using the
following set of eleven instructions.
LOAD_FAST
LOAD_CONST
BINARY_MODULO
LOAD_CONST
COMPARE_OP
JUMP_IF_FALSE_OR_POP
LOAD_FAST
LOAD_CONST
BINARY_MODULO
LOAD_CONST
COMPARE_OP
Surprisingly, the rest of the if node (the actual test that determines if the clause should be
executed) is not included within this block. The reason for this will become clearer as the
second block is discussed. As the image in figure 3.3 shows, there are two ways to exit this
block - either via a straight execution of all opcodes or a jump to block 2 when JUMP_IF_-
FALSE_OR_POP is executed.
2. Block 2 - This block maps to the first if node encapsulating the if test and subsequent clause
. The second block contains the following four instructions.
POP_JUMP_IF_FALSE
LOAD_CONST
RETURN_VALUE
JUMP_FORWARD
As will be seen in subsequent chapters, when the interpreter is executing the byte code
instructions for an if statement, it reads an object from the value stack and depending on
the truth value of that object, it either executes the next bytecode instruction or jumps to an
instruction in a different part of the set of instructions and continues execution from there.
The POP_JUMP_IF_FALSE is the instruction that handles this; this opcode takes an argument
that specifies the destination of such a jump. One might wonder why the instructions for
BoolOp node and if statements are in different blocks. To understand this, recall that python
uses short circuit evaluation for boolean operations so in this case, if n%3==0 evaluates to false
then n%5==0 is not evaluated. Looking at the instructions from the first block, one can notice
the JUMP_IF_FALSE_OR_POP instruction after the first comparison. This instruction is a variant
of a jump and hence needs a target. Think about this for a second and the need for the different
blocks becomes obvious. The JUMP_IF_FALSE_OR_POP needs a target at which to continue the
Compiling Python Source Code 29
execution of the instructions when the first expression in the boolean expression evaluates
to false due to the short circuit operation - the target in this case is the POP_JUMP_IF_FALSE
instruction for the if statement. For a jump to be possible, we need a target to jump to and with
the if statement instructions in a different block, the offset by which a jump is made can then
be calculated. If all the components of the boolean expression are evaluated then execution
will continue as normal with instructions in the if block executed after all instructions in the
BoolOp block are executed.
3. Block 3 - This maps to the first orElse AST node of the third block and it contains the following
9 instructions.
LOAD_FAST
LOAD_CONST
BINARY_MODULO
LOAD_CONST
COMPARE_OP
POP_JUMP_IF_FALSE
LOAD_CONST
RETURN_VALUE
JUMP_FORWARD
Observe that the elif statement and the n%3==0 as well as body of the statement are all in
the same block and it is easy to see why this is so. The only entry into this block is through
a jump into this block and the node can be exited either through the return instruction or a
jump when the if test fails.
4. Block 4 is a mirror image of block 3 in terms of instructions but with the arguments to the
instructions differing.
5. Block 5 maps to the final orElse AST node and contains the following 4 instructions.
LOAD_GLOBAL
LOAD_FAST
CALL_FUNCTION
RETURN_VALUE
The LOAD_GLOBAL takes the str function as argument and loads it into the value stack. LOAD_-
FAST loads the argument n onto the stack while RETURN_VALUE returns the value that is on
the stack from the execution of the CALL_FUNCTION instruction i.e str(n).
Like in the previous section, we look at the data structures that are used in building the basic blocks
to get a better grasp of the process.
Compiling Python Source Code 30
Figure 3.4 shows the relationship between the main data structures used in the process of generating
the basic blocks that make up the control flow graph.
Figure 3.4: The four major data structures used in generating a code object.
At the very top level is the compiler data structure which captures the global compilation process
for a module. The data structure is defined in listing 3.12.
1 struct compiler {
2 PyObject *c_filename;
3 struct symtable *c_st;
4 PyFutureFeatures *c_future; /* pointer to module's __future__ */
5 PyCompilerFlags *c_flags;
6
7 int c_optimize; /* optimization level */
8 int c_interactive; /* true if in interactive mode */
9 int c_nestlevel;
Compiling Python Source Code 31
10
11 struct compiler_unit *u; /* compiler state for current block */
12 PyObject *c_stack; /* Python list holding compiler_unit
13 ptrs */
14 PyArena *c_arena; /* pointer to memory allocation arena */
15 };
For every module being compiled, a compiler data structure is initialised; as the AST generated
for the module is walked, a compiler_unit data structure is generated for each code block that is
encountered within the AST.
The compiler_unit data structure as shown in listing 3.13 below captures information needed to
generate the required byte code instructions for a code block. Most of the fields defined in the
compiler_unit will be encountered when we look at code objects.
1 struct compiler_unit {
2 PySTEntryObject *u_ste;
3
4 PyObject *u_name;
5 PyObject *u_qualname; /* dot-separated qualified name (lazy) */
6 int u_scope_type;
7
8 /* The following fields are dicts that map objects to
9 the index of them in co_XXX. The index is used as
Compiling Python Source Code 32
The u_blocks and u_curblock fields reference the basic blocks that make up the code block being
compiled. The *u_ste field is a reference to a symbol table entry for the code block being compiled.
The rest of the fields have pretty self explanatory names. The different nodes that make up the code
block are walked during the compilation process and depending on whether a given node type begins
a basic block or not, a basic block containing the nodes instructions is created or instructions for the
node are added to an existing basic block. Node types that may begin a new basic block include but
are not limited to the following.
1. Function nodes.
2. Jump targets.
3. Exception handlers.
4. Boolen operations. etc.
Compiling Python Source Code 33
The basic block data structure is the rather interesting data structure in the process of generating a
control flow graph. A basic block is a sequence of instructions that has one entry point but multiple
exit points. The definition for the basic_block data structure as used in the python virtual machine
is shown in listing 3.14.
As previously mentioned, the CFG is basically composed of basic blocks and connections between
these basic blocks. The *b_instr field references an array of instruction data structures and each of
these data structure holds a bytecode instruction. These bytecode can be found in Include/opcode.h
header file. The instruction data structure is a shown in listing 3.15.
Compiling Python Source Code 34
1 struct instr {
2 unsigned i_jabs : 1;
3 unsigned i_jrel : 1;
4 unsigned char i_opcode;
5 int i_oparg;
6 struct basicblock_ *i_target; /* target block (if jump instruction) \
7 */
8 int i_lineno;
9 };
Take a look at the CFG for the fizzbuzz function, we can see that there are actually two ways,
execution can get from block 1 to block 2. The first is through normal execution - all the instructions
in block 1 are executed so the flow of execution continues in block 2. The other is by a way of the
jump instruction that exist just after the first comparison operation. Now, the target of this jump is a
basic block but the code objects that are actually executed know nothing of basic blocks – the code
block only has a stream of bytecodes and we can only index such stream using offset. We have to
take the implicit graph created with blocks as jump targets and somehow replace such blocks with
offsets into an array of instructions. This is what the process of assembling the basic blocks does.
Graph Traversal
In a post-order depth-first traversal of a graph, we recursively visit the left child node of
the graph followed by the right child node of the graph then the node itself. In our graph in
figure 3.5, when the graph is flattened using a post-order traversal, the order of the nodes
is H -> D -> I -> J -> E -> B -> K -> L -> F -> G -> C -> A. This is in contrast to
an pre-order traversal that produces A -> B -> D -> H -> E -> I -> J -> C -> F -> K
-> L -> G or a in-order traversal that produces H -> D -> B -> I -> E -> J -> A -> K
-> L -> F -> C -> G
The CFG of the fizzbuzz function given in listing 3.3 is a relatively simple graph - the result of the
post-order traversal for the fizzbuzz is block 5 -> block 4 -> block 3 -> block 2 -> block
1. Once this graph has been linearized (i.e flattened), the offsets for instructions jumps can then be
calculated by calling the assemble_jump_offsets function on the flattened graph.
Assembling the jump offsets takes place in two phases. In the first phase, the offset of every
instruction into the instruction array is calculated as shown in the snippet from listing 3.16. This
is a simple loop that works from the end of the flattened array building up the offset from 0.
Compiling Python Source Code 36
1 ...
2 totsize = 0;
3 for (i = a->a_nblocks - 1; i >= 0; i--) {
4 b = a->a_postorder[i];
5 bsize = blocksize(b);
6 b->b_offset = totsize;
7 totsize += bsize;
8 }
9 ...
In the second phase of the assembling of jump offsets, the jump targets for jump instructions is then
calculated as shown in listing 3.17. This involves computing relative jumps for relative jumps and
replacing the targets of absolute jumps with instruction offsets.
1 ...
2 for (b = c->u->u_blocks; b != NULL; b = b->b_list) {
3 bsize = b->b_offset;
4 for (i = 0; i < b->b_iused; i++) {
5 struct instr *instr = &b->b_instr[i];
6 int isize = instrsize(instr->i_oparg);
7 /* Relative jumps are computed relative to
8 the instruction pointer after fetching
9 the jump instruction.
10 */
11 bsize += isize;
12 if (instr->i_jabs || instr->i_jrel) {
13 instr->i_oparg = instr->i_target->b_offset;
14 if (instr->i_jrel) {
15 instr->i_oparg -= bsize;
16 }
17 instr->i_oparg *= sizeof(_Py_CODEUNIT);
18 if (instrsize(instr->i_oparg) != isize) {
19 extended_arg_recompile = 1;
20 }
21 }
22 }
23 }
24 ...
Compiling Python Source Code 37
With jump offsets calculated, the instructions contained in the flattened graph are emitted in reverse
post order from the traversal. The reverse post order is a topological sorting of the CFG. This means
for every edge from vertex u to vertex v, u comes before v in the sorting order. The reason for this is
obvious; we want a node that jumps to another node to always come before that jump target. With
the emission of bytecode complete, the code objects can then be assembled for each code block using
the emitted bytecode and information contained in the symbol table. The generated code object is
returned to the calling function marking the end of the compilation process.
4. Python Objects
In this chapter, we look at python objects and their implementation in the CPython virtual machine.
A fundamental understanding of how the python objects are organized is important to groking the
internals of the python virtual machine. Most of the source that is discussed here is available from
the the Include/ and Objects/ directories. Unsurprisingly, the implementation of the object system
in python is quite complex and we try to avoid getting bogged down in the gory details of the C
implementation. To kick this off, we start by looking at the PyObject structure - the workhorse of
the python object system.
4.1 PyObject
A cursory inspection of the CPython source code will show the ubiquity of the PyObject structure.
Infact, as we will see later on in this treatise, when the interpreter loop is working on values on the
evaluation stack, all of such values are regarded as PyObjects. For want of a better term, we refer
to this as the super class of all python objects. Values are actually never declared as PyObjects but
a pointer to any object can be cast to a PyObject. In layman’s term, any object can be treated as a
PyObject structure because the initial segment of all objects is actually a PyObject structure.
A word on C structs.
When we say values are never declared as PyObjects but a pointer to any object can
be cast to a PyObject we are referring to an implementation detail dependent on the C
programming langugage and how it interprets data at memory locations. C structs which
are used to represent python objects are just groups of bytes which we can interpret in any
manner which choose to. For example, a struct, test, maybe composed of 5 short values
each 2 bytes in size and summing up to 10 bytes. In C, given a reference to ten bytes we can
interpret those ten bytes as test struct composed of 5 short values regardless of whether
the 10 bytes were actually defined as a test struct - however the output when you try to
access the fields of the struct maybe gibberish. This means that given n bytes of data that
represent a python object where n is greater than the size of a PyObject, we can interpret
the first n bytes as a PyObject.
The PyObject structure is shown in listing 4.0 and is composed of a number of fields that must be
filled in order for a value to be treated as an object.
Python Objects 39
The _PyObject_HEAD_EXTRA when present is a C macro that defines fields that point to the previously
allocated object and next object forming an implicit doubly linked list of all live objects. The ob_-
refcnt field is used for memory management and the *ob_type is a pointer to a type object that
indicates the type of an object. It is this type that determines what the data represents, what kind of
data it contains and the kind of operations that can be performed on that object. Take the snippet in
listing 4.1 for example, the name, name, points to a string object and the type of the object is 'str'.
A valid question from here is since the type field points to a type object then what does the *ob_type
field of that type object point to? The ob_type for a type object indeed points back at itself hence
the saying that the type of a type is type.
Types in the VM are implemented using the _typeobject data structure defined in the Objects/Ob-
ject.h module. This is a C struct with fields for mostly functions or collection of functions that
are filled in by each type. We look at this data structure next.
Python Objects 40
35 iternextfunc tp_iternext;
36
37 struct PyMethodDef *tp_methods;
38 struct PyMemberDef *tp_members;
39 struct PyGetSetDef *tp_getset;
40 struct _typeobject *tp_base;
41 PyObject *tp_dict;
42 descrgetfunc tp_descr_get;
43 descrsetfunc tp_descr_set;
44 Py_ssize_t tp_dictoffset;
45 initproc tp_init;
46 allocfunc tp_alloc;
47 newfunc tp_new;
48 freefunc tp_free;
49 inquiry tp_is_gc;
50 PyObject *tp_bases;
51 PyObject *tp_mro;
52 PyObject *tp_cache;
53 PyObject *tp_subclasses;
54 PyObject *tp_weaklist;
55 destructor tp_del;
56
57 unsigned int tp_version_tag;
58 destructor tp_finalize;
59 } PyTypeObject;
The PyObject_VAR_HEAD field is an extension of the PyObject field that was discussed in the
previous section; this extension adds an ob_size field for objects that have the notion of length.
A comprehensive description of each of the fields in this type object structure is provided in the
python C API documentation1 . The important thing to note is that the fields in the strucutre each
implement a part of the type’s behaviour. Most of these fields are part of what we can call an object
interface or protocol because they map to functions in that can be called on python objects but their
actual implementation under the covers is type dependent. For example, tp_hash field is a reference
to a hash function for a given type - this field can be left without a value in the case where instances
of the type are not hashable; whatever function is in the tp_hash field gets invoked when the hash
method is called on an instance of that type. The type object also has the field - tp_methods that
references methods unique to that type. The tp_new slot is a reference to a function that creates new
instances of the type and so on. Some of these fields such as tp_init are optional - not every type
actually needs to run an initialization function especially when the type is immutable such as tuples
while other fields such as tp_new are compulsory.
1
https://docs.python.org/3.6/c-api/typeobj.html
Python Objects 42
Also among these fields are fields for other python protocols such as the following.
1. Number protocol - A type implementing this protocol will have implementations for PyNum-
berMethods *tp_as_number field. This field is a reference to a set of functions that implement
number like operations and it means that the type will support arithmetic that have imple-
mentations included in the tp_as_number set. For example, the non-numeric set type has an
entry into this field because it supports arithmetic operations such as -, <= and so on.
2. Sequence protocol - A type that implements this protocol will have a value in the PySe-
quenceMethods *tp_as_sequence field. This means that the type will support some or all
of the sequence operations2 such as len, in etc.
3. Mapping protocol - A type that implements this protocol will have a value in the PyMapping-
Methods *tp_as_mapping. This will enable instance of such type to be treated like python
dictionaries using the dictionary subscripting syntax for setting and accessing key-value
mappings.
4. Iterator protocol - A type that implements this protocol will have a value in the getiterfunc
tp_iter and possibly the iternextfunc tp_iternext fields enabling instances of the type to
be used like python iterators.
5. Buffer protocol - A type implementing this protocol will have a value in the PyBufferProcs
*tp_as_buffer field. These functions will enable access to the instances of the type as
input/output buffers.
As we progress through this chapter, we will look at various fields that make up a type object in
more detail but for now we look at a number of different type objects as a concrete case study of
how these fields are populated in actual type objects.
2
https://docs.python.org/3.6/library/stdtypes.html#sequence-types-list-tuple-range
Python Objects 43
1 PyTypeObject PyTuple_Type = {
2 PyVarObject_HEAD_INIT(&PyType_Type, 0)
3 "tuple",
4 sizeof(PyTupleObject) - sizeof(PyObject *),
5 sizeof(PyObject *),
6 (destructor)tupledealloc, /* tp_dealloc */
7 0, /* tp_print */
8 0, /* tp_getattr */
9 0, /* tp_setattr */
10 0, /* tp_reserved */
11 (reprfunc)tuplerepr, /* tp_repr */
12 0, /* tp_as_number */
13 &tuple_as_sequence, /* tp_as_sequence */
14 &tuple_as_mapping, /* tp_as_mapping */
15 (hashfunc)tuplehash, /* tp_hash */
16 0, /* tp_call */
17 0, /* tp_str */
18 PyObject_GenericGetAttr, /* tp_getattro */
19 0, /* tp_setattro */
20 0, /* tp_as_buffer */
21 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
22 Py_TPFLAGS_BASETYPE | Py_TPFLAGS_TUPLE_SUBCLASS, /* tp_flags */
23 tuple_doc, /* tp_doc */
24 (traverseproc)tupletraverse, /* tp_traverse */
25 0, /* tp_clear */
26 tuplerichcompare, /* tp_richcompare */
27 0, /* tp_weaklistoffset */
28 tuple_iter, /* tp_iter */
29 0, /* tp_iternext */
30 tuple_methods, /* tp_methods */
31 0, /* tp_members */
32 0, /* tp_getset */
33 0, /* tp_base */
34 0, /* tp_dict */
35 0, /* tp_descr_get */
36 0, /* tp_descr_set */
37 0, /* tp_dictoffset */
38 0, /* tp_init */
39 0, /* tp_alloc */
40 tuple_new, /* tp_new */
41 PyObject_GC_Del, /* tp_free */
Python Objects 44
42 };
1. PyObject_VAR_HEAD has been initialized with a type object - PyType_Type as the type. Recall
that the type of a type object is Type. A look at the PyType_Type type object shows that the
type of PyType_Type is itself.
2. tp_name is initialized to the name of the type - tuple.
3. tp_basicsize and tp_itemsize refer to the size of the tuple object and items contained in the
tuple object and this is filled in accordingly.
4. tupledealloc is a memory management function that handles the deallocation of memory
when a tuple object is destroyed.
5. tuplerepr is the function invoked when the repr function is called with a tuple instance as
argument.
6. tuple_as_sequence is the set of sequence methods that the tuple implements. Recall that a
tuple support in, len etc sequence methods.
7. tuple_as_mapping is the set of mapping methods supported by a tuple - in this case, the key
are integer indexes only.
8. tuplehash is the function that is invoked whenever the hash of a tuple object is required. This
comes into play when tuples are used as dictionary keys or in sets.
9. PyObject_GenericGetAttr is the generic function that is invoked when referencing attributes
of a tuple object. We look at attribute referencing in subsequent sections.
10. tuple_doc is the documentation string for a tuple object.
11. tupletraverse is the traversal function for garbage collection of a tuple object. This function
is used by the garbage collector to help in the detection of reference cycle3 .
12. tuple_iter is a method that gets invoked when the iter function is called on a tuple
object. In this case, a completely different tuple_iterator type is returned so there is no
implementation for the tp_iternext method.
13. tuple_methods are the actual methods of a tuple type.
14. tuple_new is the function invoked to create new instances of tuple type.
15. PyObject_GC_Del is another field that references a memory management function.
The rest of the fields with 0 values have been left empty as they are not needed for the functionality of
a tuple. Take the tp_init field for example, a tuple is an immutable type so once created it cannot be
changed so there is no need for any initialization beyond what happens in the the function referenced
by tp_new hence this field is left empty.
3
https://docs.python.org/3.6/c-api/typeobj.html#c.PyTypeObject.tp_traverse.
Python Objects 45
The subtype argument is the type of the object being created; the args and kwds
arguments represent positional and keyword arguments of the call to the type. Note
4
Type is used rather than class for uniformity
5
https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_new
Python Objects 46
that subtype doesn’t have to equal the type whose tp_new function is called; it may
be a subtype of that type (but not an unrelated type). The tp_new function should call
subtype->tp_alloc(subtype, nitems) to allocate space for the object, and then do
only as much further initialization as is absolutely necessary. Initialization that can
safely be ignored or repeated should be placed in the tp_init handler. A good rule of
thumb is that for immutable types, all initialization should take place in tp_new, while
for mutable types, most initialization should be deferred to tp_init.
This field is inherited by subtypes, except it is not inherited by static types whose tp_-
base is NULL or &PyBaseObject_Type.
We will use the tuple type from the previous section as an example of a builtin type. The tp_new field
of the tuple type references the - tuple_new method shown in listing 4.4 which handles the creation
of new tuple objects. To create a new tuple object, this function is dereferenced and invoked.
Ignoring the first and second conditions for creating a tuple in listing 4.4, we follow the third con-
dition, if (arg==NULL) return PyTuple_New(0) down the rabbit hole to find out how this works.
Overlooking the optimisations abound in the PyTuple_New function, the segment of the function
that creates a new tuple object is the op = PyObject_GC_NewVar(PyTupleObject, &PyTuple_Type,
size) call which basically allocates memory for an instance of the PyTuple_Object structure on the
heap. This is where one difference between internal representation of builtin types and user defined
types comes to the fore - instances of builtins types like tuple are actually C structs. This probably
among other things is for efficiency. So what does this C struct backing a tuple object look like?
It can be found in the Include/tupleobject.h as the PyTupleObject typedef and this is shown in
listing 4.5 for convenience.
Python Objects 47
1 typedef struct {
2 PyObject_VAR_HEAD
3 PyObject *ob_item[1];
4
5 /* ob_item contains space for 'ob_size' elements.
6 * Items must normally not be NULL, except during construction when
7 * the tuple is not yet visible outside the function that builds it.
8 */
9 } PyTupleObject;
1 class Test:
2 pass
The Test type as you would expect is an object of instance Type. To create an instance of the Test
type, the Test type is called as so - Test(). As always we can go down the rabbit hole to convince
ourselves of what happens when the type object is called. The Type type has a function reference -
type_call that fills the tp_call field and this is derefenced whenever the call notation is used on
an instance of Type. A snippet of the type_call function implementation is shown in listing 4.7.
1 ...
2 obj = type->tp_new(type, args, kwds);
3 obj = _Py_CheckFunctionResult((PyObject*)type, obj, NULL);
4 if (obj == NULL)
5 return NULL;
6
7 /* Ugly exception: when the call was type(something),
8 don't call tp_init on the result. */
Python Objects 48
Listing 4.7 shows that when an instance of the Type object is invoked, all that happens is that the tp_-
new field is dereferenced and whatever function that is referenced is invoked to get a new instance; if
the tp_init exists, this is invoked on the new instance to carry out initialization of the new instance.
This process provides an explanation for builtin types because afterall they have their own tp_new
and tp_init functions defined already but what about user defined types? Most times, a user does
not define a __new__ function for a new type (when defined this will go into the tp_new field during
class creation). The answer to this also lies with the type_new function that fills the tp_new field of
the Type. When creating a user defined type such as in the case of Test, the type_new function checks
for the presence of base types (super types/classes) and when there are none, the PyBaseObject_Type
type is added as a default base type as shown in listing 4.8.
Python Objects 49
Listing 4.8: Snippet showing how the PyBaseObject_Type is added to list of bases
...
if (nbases == 0) {
bases = PyTuple_Pack(1, &PyBaseObject_Type);
if (bases == NULL)
goto error;
nbases = 1;
}
...
This default base type which is also defined in the Objects/typeobject.c module contains some
default values for the various fields. Among these defaults are values for the tp_new and tp_init
fields. These are the values that get called by the interpeter for a user defined type. In the case where
the user defined type implements its own methods such as __init__, __new__ etc, these values are
called rather than those of the PyBaseObject_Type type.
One may notice that we have not mentioned any object structures like the tuple object structure
- tupleobject and ask - if no object structures are defined for a user defined class then how are
object instances handled and where do objects attributes that do not map to slots in the type reside?
This has to do with the tp_dictoffset field - a numeric field in type object. Instances are actually
created as PyObjects however when this offset value is non-zero in the instance type, it specifies the
offset of the instance attribute dictionary from the instance (the PyObject) itself as shown in figure
4.0 so for an instance of a Person type, the attribute dictionary can be assessed by adding this offset
value to the origin of the PyObject memory location.
For example, if an instance PyObject is at 0x10 and the offset is 16 then the instance dictionary that
contains instance attributes can be found at 0x10 + 16. This is hardly the only way instances store
their attributes as we will see in the following section.
1. For objects that have a type of Type, the tp_dict slot of type structure is a pointer to a dict
that contains values, variables and methods for that type. In the more conventional sense we
say the tp_dict field of the type object data structure is a pointer to the type or class dict.
2. For objects that have a type other than type (i.e instances of user defined types), that dict data
structure when present is located just after the PyObject structure that represents the object.
The tp_dictoffset value of the type of the object gives the offset from the start of an object
to this instance dict contains the instance attributes.
Doing a simple diction access to obtain attributes seems straightforward but this is hardly the end
of the story. Infact, searching for attributes is way more involved than just checking tp_dict value
for instance of
Type or the dict at tp_dictoffset for instances of user defined types. To get a full understanding,
we have to discuss the descriptor protocol - a protocol that is at the heart of attribute referencing in
python.
The Descriptor HowTo Guide6 is an excellent introduction to descriptors but a cursory description
of descriptors is provided here. Simply put, a descriptor is an object that implements the __get__,
__set__ or __delete__ special methods of the descriptor protocol. The signature for each of these
methods in python is shown in listing 4.9.
Objects implementing only the __get__ method are non-data descriptors so they can only be read
from after initialization while objects implementing the __get__ and __set__ are data descriptors
meaning that such descriptor objects are writeable. We are interested in descriptors and their
application in representing object attributes. The TypedAttribute descriptor in listing 4.10 is an
example of a descriptor used to represent an object attribute.
6
https://docs.python.org/3/howto/descriptor.html
Python Objects 51
def __set__(self,instance,value):
if not isinstance(value,self.type):
raise TypeError("Must be a %s" % self.type)
setattr(instance,self.name,value)
def __delete__(self,instance):
raise AttributeError("Can't delete attribute")
The TypedAttribute descriptor class enforces rudimentary type checking for any attribute of a class
which it is used to represent. It is important to note that descriptors are effective in this kind of case
only when defined at the class level rather than instance level i.e. in __init__ method as shown in
listing 4.11.
def name_balance_str(self):
return str(self.name) + str(self.balance)
If one thinks carefully about it, it only makes sense for this kind of descriptor to be defined at the type
level because if defined at the instance level then any assignment to the attribute will overwrite the
descriptor. One has to go through the python vm source code to get an appreciation for how integral
descriptors are to Python. Descriptors provide the mechanism behind properties, static methods,
class methods and a host of other functionality in Python. To give a concrete illustration of the
importance of descriptors, consider the algorithm for resolving an attribute from an instance, b, of
a user defined type as shown in listing 4.12.
Listing 4.12: Algorithm for find a referenced attribute in an instance of a user defined type
1. type(b).__dict__ is searched for the attribute name. If the name is
found and it is a data descriptor, the result of calling the descriptor's
__get__ method is returned. If the name is not found, then all base classes
in the *mro* of type(b) are searched in the same way.
2. b.__dict__ is searched and if attribute name is found here, it is
returned.
3. if the name from 1 is a non-data descriptor the value of call
__get__ is returned,
4. If the name is not found, an AttributeError is raised or __getattr__()
is called if provided by the user defined type.
The algorithm in listing 4.12 shows that during attribute referencing we first check for descriptor
objects; it also illustrates how the TypedAttribute descriptor is able to represent an attribute of an
object - whenever an attribute is referenced such as b.name the Account type object is searched for
the attribute and in this case, a TypedAttribute descriptor is found and its __get__ method is called
accordingly. The TypedAttribute example illustrates a descriptor but is rather contrived; to get a
real touch of how important descriptors are to the core of the language, we consider some examples
that show how they are applied.
Do note that the attribute reference algorith in listing 4.12 differs from the algorithm that is used
when referencing an attribute whose type is type. The algorithm for such is shown in listing 4.13
>> a = Account()
>> a.name_balance_str
<bound method Account.name_balance_str of <__main__.Account object at
0x102a0ae10>>
>> Account.name_balance_str
<function Account.name_balance_str at 0x102a2b840>
Looking at the snippet from listing 4.14, although we seem to reference the same attribute, the
actual objects returned are different in value and type. When referenced from the account type, the
returned value is a function type but when referenced from an instance of the account type, the
result is a bound method type. This is possible because functions are descriptors too. The definition
of a function object type is shown in listing 4.15.
PyTypeObject PyFunction_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"function",
sizeof(PyFunctionObject),
0,
(destructor)func_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
(reprfunc)func_repr, /* tp_repr */
0, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
0, /* tp_hash */
Python Objects 54
function_call, /* tp_call */
0, /* tp_str */
0, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */
func_doc, /* tp_doc */
(traverseproc)func_traverse, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
offsetof(PyFunctionObject, func_weakreflist), /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
0, /* tp_methods */
func_memberlist, /* tp_members */
func_getsetlist, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
func_descr_get, /* tp_descr_get */
0, /* tp_descr_set */
offsetof(PyFunctionObject, func_dict), /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
func_new, /* tp_new */
};
The function object fills in the tp_descr_get field with a func_descr_get function thus instances of
the function type are non-data descriptors. Listing 4.16 shows the implementation of the funct_-
descr_get method.
The func_descr_get can be invoked during either type attribute resolution or instance attribute
resolution as described in the previous section. When invoked from a type, the call to the
Python Objects 55
When a method is defined in a class, we use the self argument as the first parameter to
any instance method because in reality, instance methods take the instance (called self
by convention) as the first argument. A call such as b.name_balance_str() is actually
the samething as type(b).name_balance_str(b). The reason why we are able to invoke
b.name_balance_str() is because the value returned by b.name_balance_str is a method
object which is a thin wrapper around the name_balance_str with the instance already
bound to the method instance. So when we make a call such as b.name_balance_str() the
method uses the bound instance as an argument to the wrapped function hiding this detail
from us.
In another instance of the importance of descriptors, consider the snippet in listing 4.17 which shows
the result of accessing the __dict__ attribute from both an instance of the builtin type and an
instance of a user defined type.
Listing 4.17: Accesing the __dict__ attribute from an instance of the builtin type and an instance of a user defined
type
class A:
pass
>>> A.__dict__
mappingproxy({'__module__': '__main__', '__doc__': None, '__weakref__': \
<attribute '__weakref__' of 'A' objects>, '__dict__': <attribute '__dict__' of '\
A' objects>})
>>> i = A()
>>> i.__dict__
{}
>>> A.__dict__['name'] = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'mappingproxy' object does not support item assignment
>>> i.__dict__['name'] = 2
>>> i.__dict__
{'name': 2}
>>>
Python Objects 56
Observe from listing 4.17 that both objects do not return the vanilla dictionary type when the _-
_dict__ attribute is referenced. The type object seems to return a mapping proxy that we cannot
even assign to while the instance of type returns a vanilla dictionary mapping that supports all the
usual dictionary functions. So it seems that attribute referencing is done differently for these objects.
Recall the algorithm described for attribute search from a couple of sections back. The first step is
to search the __dict__ of the type of the object for the attribute so we go ahead and do this for both
object in listing 4.18.
We see that the __dict__ attribute is represented by data descriptors for both objects and hence
the reason why we can get different object types. We would like to find out what happens under
the covers for this descriptors just as we did in the case of functions and bound methods. A good
place to start is the Objects/typeobject.c module and the definition for the type type object. An
interesting field is the tp_getset field that contains an array of C structs (PyGetSetDef values) which
are shown in listing 4.19. This is the collection of values for which descriptor objects will be insert
into the type's type __dict__ attribute - this is the mapping that the tp_dict slot of the type object
points to.
These values are not the only ones for which descriptors are insert into the type dict, there are other
values such as the tp_members and tp_methods values which have descriptors created and insert
into the tp_dict during type initialization. The insertion of these values into the dict happens when
PyType_Ready function is called on the type. As part of the PyType_Ready function initialization
Python Objects 57
process, descriptor objects are created for each entry in the type_getsets and then added into the
tp_dict mapping - the add_getset function in the Objects/typeobject.c handles this.
Returning to our __dict__, attribute, we know that after initialization of the type, the __dict__-
attribute exists in the tp_dict field of the type so lets see what the getter function of this descriptors
does. The getter function is the type_dict function shown in listing 4.20.
The tp_getattro field points to the function that is the first port of call for getting attributes for any
object. For the type object, it points to the type_getattro function. This method in turn implements
the attribute search algorithm as described in listing 4.13. The function invoked by the descriptor
found in the type dict for the __dict__ attribute is the type_dict function given in listing 4.19 and it
is pretty easy to understand. The return value is of interest to us here; it is a dictionary proxy to the
actual dictionary that holds the type attribute; this explains the mappingproxy type that is returned
when the __dict__ attribute of a type object is queried.
So what about the instance of A, a user defined type, how is the __dict__ attribute resolved? Now
recall that A is actually an object of type type so we go hunting in the Object/typeobject.c module
to see how new type instance are created. The tp_new slot of the PyType_Type contains the type_new
function that handles the creation of new type objects. Perusing through all the type creation code
in the function one stumbles on the snippet in listing 4.21.
Assuming the first conditional is true as, the tp_getset field is filled with the value shown in listing
4.22.
Python Objects 58
When (*tp->tp_getattro)(v, name) is invoked, the tp_getattro field which contains a pointer to
the PyObject_GenericGetAttr is called. This function is responsible for implementing the attribute
search algorithm for a user-defined types. In the case of the __dict__ attribute, the descriptor is
found in the object type’s dict and the __get__ function of the descriptor is the subtype_dict
function defined for the __dict__ attribute from listing 4.21. The subtype_dict getter function is
shown in listing 4.23.
Listing 4.23: The getter function for __dict__ attribute of a user-defined type
base = get_builtin_base_with_dict(Py_TYPE(obj));
if (base != NULL) {
descrgetfunc func;
PyObject *descr = get_dict_descriptor(base);
if (descr == NULL) {
raise_dict_descr_error(obj);
return NULL;
}
func = Py_TYPE(descr)->tp_descr_get;
if (func == NULL) {
raise_dict_descr_error(obj);
return NULL;
}
return func(descr, obj, (PyObject *)(Py_TYPE(obj)));
}
return PyObject_GenericGetDict(obj, context);
}
invoked. The PyObject_GenericGetDict and an associated helper that actually fetches the instance
dict are shown in listing 4.24. The actual get the dict function is the _PyObject_GetDictPtr function
that queries the object for its dictoffset and uses that to compute the address of the the instance
dict. In a situation where this function returns a null value, PyObject_GenericGetDict can go ahead
to return a new dict to the calling function.
dictoffset = tp->tp_dictoffset;
if (dictoffset == 0)
return NULL;
if (dictoffset < 0) {
Py_ssize_t tsize;
size_t size;
dictoffset += (long)size;
assert(dictoffset > 0);
assert(dictoffset % SIZEOF_VOID_P == 0);
}
return (PyObject **) ((char *)obj + dictoffset);
}
This explanation succinctly sums up how descriptors are used to implement custom attribute access
depending on types. The same strategy described above is used throughout the VM for other
instances in which attribute access is performed using descriptors. Descriptors are pervasive within
the VM; __slots__, static and class methods, properties are just some further examples of language
features that are made possible by the use of descriptors.
C + (C1 C2 ... CN) = C C1 C2 ... CN denotes the sum of the lists [C] +
[C1, C2, ... ,CN].
7
https://www.python.org/download/releases/2.3/mro/
8
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.3910&rep=rep1&type=pdf
Python Objects 61
Consider a type C in a multiple inheritance hierarchy, with C inheriting from the base types B1, B2,
... , BN, the linearization of C is the sum of C plus the merge of the linearizations of the parents
and the list of the parents - L[C(B1 ... BN)] = C + merge(L[B1] ... L[BN], B1 ... BN). The
linearization of the object type which has no parents is trivial - L[object] = object. The merge
operation is calculated according to the following algorithm9 :
take the head of the first list, i.e L[B1][0]; if this head is not in the tail of any of the
other lists, then add it to the linearization of C and remove it from the lists in the merge,
otherwise look at the head of the next list and take it, if it is a good head. Then repeat
the operation until all the class are removed or it is impossible to find good heads. In
this case, it is impossible to construct the merge, Python 2.3 will refuse to create the
class C and will raise an exception.
Some type hierarchies cannot be linearized using this algorithm and in such cases the VM throws
an error and does not create such hierarchies.
Assuming we have an inheritance hierarchy such as that shown in figure 4.1, the algorithm for
creating the mro would proceed as follows starting from the top of the hierarchy with O, A, and B.
The linearizations of O, A and B are trivial:
9
https://www.python.org/download/releases/2.3/mro/
Python Objects 62
Listing 4.26: Calculating linearization for types O, A and B from figure 4.1
L[O] = O
L[A] = A O
L[B] = B O
L[Y] = Y + merge(AO,BO,AB)
= Y + A + merge(O,BO,B)
= Y + A + B + merge(O,O)
= Y A B O
With linearizations for X and Y computed, we can now compute that for Z as shown in listing 4.28.
L[Z] = Z + merge(XABO,YABO,XY)
= Z + X + merge(ABO,YABO,Y)
= Z + X + Y + merge(ABO,ABO)
= Z + X + Y + A + merge(BO,BO)
= Z + X + Y + A + B + merge(O,O)
= Z X Y A B O
5. Code Objects
In this part of the write-up, we explore code objects. Code objects are central to the operation of the
python virtual machine. Code objects encapsulate the bytecode of the python virtual machine; we
may call the bytecode the assembly language for the python virtual machine.
Code objects as the name suggests represent compiled executable python code. We have seen code
objects before when we discussed the compilation of python source. Code objects are produced
whenever a block of python code is compiled. As described in the brilliant python documentation1
A Python program is constructed from code blocks. A block is a piece of Python program
text that is executed as a unit. The following are blocks: a module, a function body, and a
class definition. Each command typed interactively is a block. A script file (a file given
as standard input to the interpreter or specified as a command line argument to the
interpreter) is a code block. A script command (a command specified on the interpreter
command line with the ‘-c’ option) is a code block. The string argument passed to the
built-in functions eval() and exec() is a code block.
The code object contains runnable bytecode instructions that when run alter the state of the python
virtual machine. Given a function, we can access the code object for the function body using the
__code__ attribute of the function as in the following snippet.
def return_author_name():
return "obi Ike-Nwosu"
>>> return_author_name.__code__
<code object return_author_name at 0x102279270, file "<stdin>", line 1>
For other code blocks, one can obtain the code objects for that code block by compiling such code.
The compile2 function provides a facility for this in the python interpreter. The code objects come
with a number of fields that are used by the interpreter loop when executing and we look at some
of these fields in the following sections.
1
https://docs.python.org/3/reference/executionmodel.html
2
https://docs.python.org/3.6/library/functions.html#compile
Code Objects 64
co_argcount = 1
co_cellvars = ()
co_code = b'|\x00d\x01\x16\x00d\x02k\x02r\x1e|\x00d\x03\x16\x00d\x02k\x02r\x\
1ed\x04S\x00n,|\x00d\x01\x16\x00d\x02k\x02r0d\x05S\x00n\x1a|\x00d\x03\x16\x00d\x\
02k\x02rBd\x06S\x00n\x08t\x00|\x00\x83\x01S\x00d\x00S\x00'
co_consts = (None, 3, 0, 5, 'FizzBuzz', 'Fizz', 'Buzz')
co_filename = /Users/c4obi/projects/python_source/cpython/fizzbuzz.py
co_firstlineno = 6
co_flags = 67
co_freevars = ()
co_kwonlyargcount = 0
co_lnotab = b'\x00\x01\x18\x01\x06\x01\x0c\x01\x06\x01\x0c\x01\x06\x02'
co_name = fizzbuzz
co_names = ('str',)
co_nlocals = 1
co_stacksize = 2
co_varnames = ('n',)
The fields printed out seem almost self-explanatory except for the co_lnotab and co_code fields
that seem to contain gibberish. We go ahead and explain each of these fields and their importance
o the python virtual machine.
1. co_argcount: This is the number of arguments to a code block. This has a value only for
function code blocks. The value is set during the compilation process to the length of the
argument set of the code blocks’s AST. The evaluation loop makes use of these variables during
the set-up for code evaluation to carry out sanity checks such as checks that all arguments
are present and for storing locals.
2. co_code: This holds the sequence of bytecode instructions that is executed by the evaluation
loop. Each of these bytecode instruction sequence is composed of an opcode and an oparg -
argument to the opcode where it exists. For example, co.co_code[0] returns the first byte of
instruction, 124 that maps to a python LOAD_FAST opcode.
3. co_consts: This field is a list of constants like string literals and numeric values contained
within the code object. The example, from above shows the content of this field for the
fizzbuzz function. The values contained in this list are integral to code execution as they
Code Objects 65
are the values referenced by the LOAD_CONST opcode. The operand argument to a bytecode
instruction such as the LOAD_CONST is the index into this constant list. For example consider
the co_consts value of (None, 3, 0, 5, 'FizzBuzz', 'Fizz', 'Buzz') for the FizzBuzz
function and contrast with the disassembled code object below.
...
66 LOAD_GLOBAL 0 (str)
68 LOAD_FAST 0 (n)
70 CALL_FUNCTION 1
72 RETURN_VALUE
74 LOAD_CONST 0 (None)
76 RETURN_VALUE
Recall that during the compilation process, a return None is added if there is no return
statement at the end of a function so we can tell that the bytecode instruction at offset 74
is a LOAD_CONST for a None value. The argument to the opcode is a 0 and we can see that
the None value has an index of 0 in the constants list from where it is actually loaded by the
LOAD_CONST instruction.
4. co_filename: This field as the name suggests contains the name of the file that contains the
source code from which the code object was created.
5. co_firstlineno: This gives the line number on which the source for the code object begins. This
value plays quite an important role during activities such as debugging code.
6. co_flags: This field indicates the kind of code object. For example, when the code object is that
of a coroutine, the flag is set to 0x0080. There are other flags such as CO_NESTED that indicates
if a code object is nested within another code block, CO_VARARGS that indicates if a code block
has variable arguments and so on. These flags affect the behaviour of the evaluation loop
during bytcode execution.
7. co_lnotab: The contains a string of bytes that are used to compute the source line numbers
that an instruction at a bytecode offset corresponds to. For example, the dis function makes
use of this when computing line numbers for instructions.
Code Objects 66
8. co_varnames: This is the number of names that are locally defined in a code block. Contrast
this with co_names.
9. co_names: This a collection of non-local names that are used within the code object. For
example, the snippet in listing 5.4 references a non-local variable, p.
Listing 5.4: Illustrating local and non-local names
def test_non_local():
x = p + 1
return x
The result of introspecting on the code object for the function shown in listing 5.4 is shown
in listing 5.5.
From this example, the difference between the c_names and co_varnames is obvious. co_-
varnames references the locally defined names while co_names references non-locally defined
names. Do note that it is only during execution of the program that if the name p is not found,
an error is thrown. The bytecode instructions for the function in listing 5.4 is shown in listing
5.6 and it is obvious how this works.
Code Objects 67
Note how rather than a LOAD_FAST as was seen in the previous example, we have LOAD_GLOBAL
instruction. Later when we discuss the evaluation loop, we will discuss an optimisation that
the evaluation loop carries out that makes the use of the LOAD_FAST instruction as the name
suggests.
10. co_nlocals: This is a numeric value that represents the number of local names that the code
object makes use of. In the immediate past example from listing 5.4, the only local variable
used is x and thus this value is 1 for the code object of that function.
11. co_stacksize: The python virtual machine is a stack based machine i.e values used in
evaluation and results of evaluation are read from and written to an execution stack. This
co_stacksize value is the maximum number of items that exist on the evaluation stack at
any point during the execution of the code block.
12. co_freevars: The co_freevars field is a collection of free variables defined within the code
block. This field is mostly relevant to nested functions that form closures. Free variables are
variables that are used within a block but not defined within that block; this does not apply to
global variables. The concept of a free variable is best illustrated with an example as shown
in listing 5.7.
Listing 5.7: A simple nested function
def f(*args):
x=1
def g():
n = x
The co_freevars field is empty for the code object of the f function while that of the g function
contains the x value. Free variables are strongly interrelated with cell variables.
13. co_cellvars: The co_cellvars field is a collection of names for which cell storage objects have
to be created during the execution of a code object. Take the snippet in listing 5.7, the co_-
cellvars field of the code object for the function - f, contains just the name -x while that of
the nested function’s code object is empty; recall from the discussion on free variables that
the co_freevars collection of the nested function’s code object consists of just this name - x.
This captures the relationship between cell variables and free variables - a free variable in a
nested scope is a cell variable within the enclosing scope. Special cell objects are created for
storing values in this cell variable collection during the execution of the code object. This is
Code Objects 68
so because each value in this field is used by nested code objects whose life time may exceed
that of the enclosing code object so such values have to be stored in other locations that do
not get deallocated when the execution of the code object is completed.
b'|\x00d\x01\x16\x00d\x02k\x02r\x1e|\x00d\x03\x16\x00d\x02k\x02r\x1ed\x04S\x00n,\
|\x00d\x01\x16\x00d\x02k\x02r0d\x05S\x00n\x1a|\x00d\x03\x16\x00d\x02k\x02rBd\x06\
S\x00n\x08t\x00|\x00\x83\x01S\x00d\x00S\x00'
To get a human readable version of the byte string, we use the dis function from the dis module to
extract a human readable printout as shown in listing 5.8.
7 0 LOAD_FAST 0 (n)
2 LOAD_CONST 1 (3)
4 BINARY_MODULO
6 LOAD_CONST 2 (0)
8 COMPARE_OP 2 (==)
10 POP_JUMP_IF_FALSE 30
12 LOAD_FAST 0 (n)
14 LOAD_CONST 3 (5)
16 BINARY_MODULO
18 LOAD_CONST 2 (0)
20 COMPARE_OP 2 (==)
22 POP_JUMP_IF_FALSE 30
...
The first column of the output shows the line number for that instruction. Multiple instructions may
map to the same line number. This value is calculated using information from the co_lnotab field of a
code object. The second column is the offset of the given instruction from the start of the bytecode.
Assuming the bytecode string is an contained in an array, then this value is the index into that
array at which the given instruction can be found. The third column is the actual human readable
instruction opcode; the full range of opcodes can be found in the Include/opcode.h module. The
fourth column is the argument to the instruction.
The first LOAD_FAST instruction takes the argument 0. This value is the index into the co_varnames
array. The last column is the value of the argument - provided by the dis function for ease of
use. Some arguments do not take explicit arguments. Notice that the BINARY_MODULO and RETURN_-
VALUE instructions take no explicit argument. Recall that the python virtual machine is a stack based
machine so these instructions read values from the top of the stack.
Bytecode instructions are two bytes in size - one byte for the opcode and the second byte for the
argument to the opcode. In the case where the opcode does not take an argument then the second
argument byte is zeroed out. The Python virtual machine uses a little endian byte encoding on the
machine which I am currently typing out this book thus the 16 bits of code are structured as shown
in figure 5.0 with the opcode taking up the higher 8 bits and the argument to the opcode taking up
the lower 8 bits.
Sometimes, the argument to an opcode maybe unable to fit into the default single byte. For these kind
of arguments, the python virtual machine makes use of the EXTENDED_ARG opcode. What the python
virtual machine does is to take an argument that is too large to fit into a single byte and split it into
two (we assume that it can fit into two bytes here but this logic is easily extended past two bytes)
- the most significant byte is an argument to the EXTENDED_ARG opcode while the least significant
byte is the argument to its actual opcode. The EXTENDED_ARG opcode(s) will come before the actual
opcode in the sequence of opcodes and the argument can then be rebuilt by shifts to the right and
or’ing with other sections of the argument. For example, if one wanted to pass the value 321 as
argument to the LOAD_CONST opcode, this value cannot fit into a single byte so the EXTENDED_ARG
opcode is used. The binary represenation of this value is 0b101000001 so the actual do work opcode
(LOAD_CONST) takes the first byte (1000001) as argument (65 in decimal) while the EXTENDED_ARG
opcode takes the next byte (1) as argument thus we have (144, 1), (100, 65) as the sequence of
instructions that is output.
The documentation for the dis module3 contains a comprehensive list and explanation of all opcodes
currently implemented by the virtual machine.
3
https://docs.python.org/3.6/library/dis.html
Code Objects 70
def f():
print(c)
a = 1
b = 3
def g():
print(a+b)
c=2
def h():
print(a+b+c)
When the module code block is compiled we get the output shown in listing 5.10.
The instruction at byte offset 0 loads a code object that is stored as the name f - our function
definition using the MAKE_FUNCTION Instruction. The content of this code object is shown in listing
5.11
Code Objects 71
Listing 5.11: Bytecode instruction disassembly for nested function from listing 5.9
co_argcount = 0
co_cellvars = ()
co_code = b'd\x00d\x01\x84\x00Z\x00d\x02S\x00'
co_consts = (<code object f at 0x1022029c0, file "fizzbuzz.py", line 1>, 'f'\
, None)
co_filename = fizzbuzz.py
co_firstlineno = 1
co_flags = 64
co_freevars = ()
co_kwonlyargcount = 0
co_lnotab = b''
co_name = <module>
co_names = ('f',)
co_nlocals = 0
co_stacksize = 2
co_varnames = ()
As would be expected in a module, the fields related to code object arguments are all zero - (co_-
argcount, co_kwonlyargcount). The co_code field contains bytecode instructions as shown in listing
5.10. The co_consts field is an interesting one. The constants in the field are a code object and
the names - f and None. The code object is that of the function, the value ‘f’ is the name of the
function and None is the return value of the function - recall the python compiler adds a return
None statement to a code object without one.
Notice that function objects are not actually created during the compilation of the module. What
we have are just code objects - the functions are actually created during the execution of the code
objects as seen in listing 5.10. Inspecting the attributes of the code object will actually show that it
is also composed of other code objects as shown in listing 5.12.
Listing 5.12: Bytecode instruction disassembly for nested function from listing 5.10
co_argcount = 0
co_cellvars = ('a', 'b')
co_code = b't\x00t\x01\x83\x01\x01\x00d\x01\x89\x00d\x02\x89\x01\x87\x00\x87\
\x01f\x02d\x03d\x04\x84\x08}\x00d\x00S\x00'
co_consts = (None, 1, 3, <code object g at 0x101a028a0, file "fizzbuzz.py", \
line 5>, 'f.<locals>.g')
co_filename = fizzbuzz.py
co_firstlineno = 1
co_flags = 3
co_freevars = ()
Code Objects 72
co_kwonlyargcount = 0
co_lnotab = b'\x00\x01\x08\x01\x04\x01\x04\x01'
co_name = f
co_names = ('print', 'c')
co_nlocals = 1
co_stacksize = 3
co_varnames = ('g',)
The same logic explained earlier on applies here with the function object being created only during
the execution of the code object.
typedef struct {
PyObject_HEAD
int co_argcount; /* #arguments, except *args */
int co_kwonlyargcount; /* #keyword only arguments */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
int co_firstlineno; /* first source line number */
PyObject *co_code; /* instruction opcodes */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
The fields are almost all the same as those found in a python code objects except for the co_stack-
size, co_flags, co_cell2arg, co_zombieframe, co_weakreflist and co_extra. co_weakreflist
and co_extra are not really interesting fields at this point. The rest of the fields here pretty much
serve the same purpose as those in the code object. The co_zombieframe is a field that exist for
optimisation purposes. This holds a reference to a frame object that was previously used as a context
to execute the code object. This is used as the execution frame when such code object is being re-
executed in order to prevent the overhead of allocating memory for another frame object.
6. Frames Objects
Code objects contain the executable byte code but lack the contextual information required for the
execution of such code. Take the set of bytecode instructions in listing 6.0 for example, LOAD_COST
takes an index as argument but the code object has no array or data structure that contains data to
load the value at the index from.
Another data structure that provides such contextual information is required for the execution of
code objects and this is where frame objects come in. One can think of the frame object as a container
in which the code object is executed - it knows about the code object and has references to data and
values that are required during the execution of some code object. As usual, python does indeed
provide us with some facilities to inspect frame objects using the sys._getframe() function as
shown in the listing 6.1 snippet.
Before a code object can be executed, a frame object within which the execution of such a code
object takes place has to be created. Such a frame object contains all the namespaces required for
execution of a code object (local, global, and builtin), a reference to the current thread of execution,
stacks for evaluating byte code and other housekeeping information that are important for executing
byte code. To get a better understanding of the frame object, let us look at the definition of the frame
object data structure from the Include/frame.h module and reproduced in listing 6.2.
The fields coupled with the documentation within the frame are not difficult to understand but we
provide a bit more detail about these fields and how they relate to the execution of bytecode.
1. f_back: This field is a reference to the frame of the code object that was executing prior to the
current code object. Given a set of frame objects, the f_back fields of these frames together
form a stack of frames that goes all the way back to the initial frame. This initial frame then
has a NULL value in this f_back field. This implicit stack of frames forms what we refer to as
the call stack.
2. f_code: This field is a reference to a code object. This code object contains the bytecode that
is executed within the context of this frame.
Frames Objects 76
3. f_builtins: This is a reference to the builtin namespace. This namespace contains names
such as print, enumerate etc and their corresponding values.
4. f_globals: This is a reference to the global namespace of a code object.
5. f_locals: This is a reference to the local namespace of a code object. As previously mentioned
these names have been defined within the scope of a function. When we discuss the f_-
localplus field, we will see an optimization that python does when working with locally
defined names.
6. f_valuestack: This is a reference to the evaluation stack for the frame. Recall that the python
virtual machine is a stack-based virtual machine so during evaluation of bytecode, values are
read from the top of a stack and results from the evaluation of byte code are stored on the top
of a stack. This field is the stack that is used during code object execution. The stacksize of
a frame’s code object gives the maximum depth to which this data structure can grow to.
7. f_stacktop: As the name suggests, the field points to the next free slot of the evaluation
value stack. When a frame is newly created, this value is set to the value stack - this is the
first available space on the stack as there are no items on the stack.
8. f_trace: This field references a function that is used for tracing the execution of python code.
9. f_exc_type, f_exc_value, f_exc_traceback, f_gen: are fields that are used for book keeping
in order to be able to cleanly execute generator code. More on this when we discuss python
generators.
10. f_localplus: This is a reference to an array that contains enough space for storing cell and
local variables. This field provides a mechanism for the evaluation loop to optimize loading
and storing values of names to and from the value stack with the LOAD_FAST and STORE_FAST
instructions. The LOAD_FAST and STORE_FAST opcodes provides faster name access than their
counterpart LOAD_NAME and STORE_NAME opcodes because they use array indexing for accessing
value of names and this is done in approximately constant time unlike their counterparts that
search a mapping for the value associated with a given name. When we discuss the evaluation
loop, we see how this value is set up during the frame bootstrapping process.
11. f_blockstack: This field references a data structure that acts as a stack which is used to handle
loops and exception handling. This is the second stack in addition to the value stack that is
of utmost importance to the virtual machine but this does not receive as much attention as it
rightfully should. The relationship between the block stack, exceptions and looping constructs
is quite complex and we look at that in the coming chapters.
First, code objects have a field, the co_zombieframe which references an inert frame object. When a
code object is executed, the frame within which it was executed is not immediately deallocated. The
frame is rather maintained in the co_zombieframe so when next the same code object executed, time
is not spent allocating memory for a new execution frame. The ob_type, ob_size, f_code, f_val-
uestack fields retain their value; f_locals, f_trace, f_exc_type, f_exc_value, f_exc_traceback
are NULL and f_localplus retains its allocated space but with the local variables nulled out. The
remain fields do not hold a reference to any object. The second optimization that is used by the
virtual machine is to maintain a free list of pre-allocated frame objects from which frames can be
obtained for the execution of code objects.
The source code for frame objects is actually a gentle read and one can see how the zombie frame
and freelist concepts are implemented by looking at how allocated frames are deallocated after the
execution of the enclosed code object. The interesting part of the code for frame deallocation is
shown in listing 6.3.
if (co->co_zombieframe == NULL)
co->co_zombieframe = f;
else if (numfree < PyFrame_MAXFREELIST) {
++numfree;
f->f_back = free_list;
free_list = f;
}
else
PyObject_GC_Del(f);
Careful observation shows that the freelist will only ever grow when a recursive call is made i.e
a code object tries to execute itself as that is the only time the zombieframe field is NULL. This
tiny optimization of using the freelist helps eliminate to a certain degree, the repeated memory
allocations for such recursive calls.
This chapter covers the main points about the frame object without delving into the evaluation loop
which is tightly integrated with the frame objects. There are still a few things that have been left
out of this discussion but which we cover in subsequent chapters. For example,
1. How are values passed on from one frame to the next when code execution hits a return
statement?
2. What is the thread state and where does the thread state come from?
3. How are exceptions bubble down the stack of frames when an exception is thrown in the
executing frame? etc.
Most of these question will be answered when we look at the very important interpreter and thread
state data structures in the next chapter and then the evaluation loop in subsequent chapters.
7. Interpreter and Thread States
As disucussed in opening chapters, one of the steps during the bootstrapping of the python
interpreter is the initialisation of the interpreter state and thread state data structures. In this chapter,
we look at these data structures in detail and explain the importance of these data structures.
PyObject *modules;
PyObject *modules_by_index;
PyObject *sysdict;
PyObject *builtins;
PyObject *importlib;
PyObject *codec_search_path;
PyObject *codec_search_cache;
PyObject *codec_error_registry;
int codecs_initialized;
int fscodec_initialized;
...
PyObject *builtins_copy;
Interpreter and Thread States 79
PyObject *import_func;
} PyInterpreterState
The fields shown in listing 7.0 maybe familiar to anyone that has covered all the material up to this
point and has used python for a considerable amount of time. We discuss some of the fields of the
interpreter state data structure once again.
• *next : There can be multiple interpreter states within a single OS process that is running a
python executable. This *next field references another interpreter state data structure within
the python process if such exist and these form a linked list of interpreter states as shown
in figure 7.0. Each interpreter state has its own set of variables that will be used by a thread
of execution that references that interpreter state. The memory and Global Interpreter Lock
available to the process is however shared by all interpreter threads within that process.
The remaining fields are variables that are shared by all cooperating threads of the interpreter
state. The modules field is a table of installed python modules - we see how the interpreter finds
these modules later on when we discuss the import system, the builtins field is a reference to the
built-in sys module. The content of this module is the is the set of builtin functions such as len,
enumerate etc and the Python/bltinmodule.c module contains implementations for most of the
contents of the module. The importlib is a field that references the implementation of the import
mechanism - we speak a bit more about this when we discuss the import system in detail. The
*codec_search_path, *codec_search_cache, *codec_error_registry, *codecs_initialized and
*fscodec_initialized are fields that all relate to codecs that python uses to encode and decode
bytes and text. The values in these fields are used to locate such codecs as well as handle errors
that maybe related to using such codecs. An executing python program is composed of one or more
threads of execution. The interpreter has to maintain some state for each thread of execution and it
is able to do this by maintaining a thread state data structure for each thread of execution. We look
at this data structure next.
Interpreter and Thread States 80
Py_tracefunc c_profilefunc;
Py_tracefunc c_tracefunc;
PyObject *c_profileobj;
PyObject *c_traceobj;
PyObject *curexc_type;
PyObject *curexc_value;
PyObject *curexc_traceback;
PyObject *exc_type;
PyObject *exc_value;
PyObject *exc_traceback;
...
} PyThreadState;
The next and previous fields of a thread state data structure reference threads states created prior
to and just after the given thread state. These fields form a doubly linked list of thread states that
share a single interpreter state. The interp field references the interpreter state that the thread state
belongs to. The frame references the current frame of execution; as the code object that is executed
changes, the value referenced by this field also changes.
Interpreter and Thread States 81
The recursion_depth as the name suggest specifies how deep the stack frame should get during
a recursive call. The overflowed flag is set when the stack overflows - after a stack overflow, the
thread allows 50 more calls to enable some clean-up operations. The recursion_critical flag is
used to signal to the thread that the code being executed should not overflow. The tracing and
use_tracing flag are related to functionality for tracing the execution of the thread. The *curexc_-
type, *currexc_value, *curexc_traceback, *exc_type, *exc_value and *curexc_traceback are
fields that are all used in the exception handling process as will be seen in subsequent chapters.
It is important to understand the difference between the thread state and an actual thread. The thread
state is just a data structure that encapsulates some state for an executing thread. Each thread state is
assocated with a native OS thread within the running python process. Figure 7.1 is a good graphical
illustration of this relationship. We can clearly see that a single python process is home to at least
one interpreter state and each interpreter state is home to one or more thread states and each of
these thread states maps to an operation system thread of execution.
Operating System threads and associated python thread states are created either during the
initialization of the interpreters or when invoked by the threading module. Even with multiple
threads alive within a python process, only one thread can actively carry out CPU bound tasks
at any given time. This is because an executing thread must hold the GIL in order to execute byte
code within the python virtual machine. This chapter will not be complete without a look at the
famous or infamous GIL concept so we take this on in the next section
the GIL to run but as we will see, all such a thread can actually do is wait to get the GIL and only
when it holds the GIL is it able to execute bytecode. We take a look at this whole process.
At the interpreter startup, a single main thread of execution is created and there is no contention for
the GIL as there is no other thread around so the main thread does not bother to acquire the lock.
When another thread is spawned using the python threading module, the GIL comes into play. The
snippet in listing 7.3 is from the Modules/_threadmodule.c and provides an idea of how this process
proceeds as a new thread is created.
boot->interp = PyThreadState_GET()->interp;
boot->func = func;
boot->args = args;
boot->keyw = keyw;
boot->tstate = _PyThreadState_Prealloc(boot->interp);
if (boot->tstate == NULL) {
PyMem_DEL(boot);
return PyErr_NoMemory();
}
Py_INCREF(func);
Py_INCREF(args);
Py_XINCREF(keyw);
PyEval_InitThreads(); /* Start the interpreter's thread-awareness */
ident = PyThread_start_new_thread(t_bootstrap, (void*) boot);
Interpreter and Thread States 83
The snippet in listing 7.3 is from the thread_PyThread_start_new_thread function that is invoked
to create a new thread. boot is a data structure the contains all the information that a new thread
needs to execute. The _PyThreadState_Prealloc function call creates a new thread state for the
thread which has not yet been created. Before the thread is actually created, the main thread of
execution must acquire the GIL; the call to PyEval_InitThreads handles this. With the interpreter
now thread aware and the main thread holding the GIL, the PyThread_start_new_thread is invoked
to actually create the new operating system thread. When a new thread is being spawned, a function
that the thread should call when it comes alive is passed to the thread. In this case, that function
is the _tbootstrap function in the Modules/_threadmodule.c module. A snapshot of this bootstrap
function is shown in listing 7.4.
tstate = boot->tstate;
tstate->thread_id = PyThread_get_thread_ident();
_PyThreadState_Init(tstate);
PyEval_AcquireThread(tstate);
nb_threads++;
res = PyEval_CallObjectWithKeywords(
boot->func, boot->args, boot->keyw);
...
The GIL is just a boolean variable (gil_locked) whose access is protected by a mutex
(gil_mutex), and whose changes are signalled by a condition variable (gil_cond).
gil_mutex is taken for short periods of time, and therefore mostly uncontended. In the
GIL-holding thread, the main loop (PyEval_EvalFrameEx) must be able to release the
GIL on demand by another thread. A volatile boolean variable (gil_drop_request) is
used for that purpose, which is checked at every turn of the eval loop. That variable is set
after a wait of interval microseconds on gil_cond has timed out. [Actually, another
volatile boolean variable (eval_breaker) is used which ORs several conditions into one.
Volatile booleans are sufficient as inter-thread signalling means since Python is run
on cache-coherent architectures only.] A thread wanting to take the GIL will first let
Interpreter and Thread States 84
What does the above mean for a newly spawned thread? The t_bootstrap function from listing 7.4
invokes the PyEval_AcquireThread function that handles requesting for the GIL. A lay explanation
for what happens when this request is made is thus assuming A is the main thread of execution
holding the GIL while B is the new thread being spawned.
1. When B is spawned, take_gil is invoked. This checks if the conditional gil_cond variable is
set. If it is not set then the thread starts a wait.
2. After wait time elapses, the gil_drop_request is set.
3. Thread A executing on the evaluation loop checks if the gil_drop_request variable is set on
each iteration of the loop.
4. Thread A drops the GIL when it detects that the gil_drop_request variable is set and then
also sets the gil_cond variable.
5. Thread A also waits on another variable - switch_cond, until the value of the gil_last_-
holder is set to a value other than thread A’s thread state pointer indicating that another
thread has taken the GIL.
6. Thread B now has the GIL and can go ahead to execute bytecode.
7. Thread A waits a given time, sets the gil_drop_request and the cycle continues.
To conclude this chapter, we recap the model we have created so far of the python virtual machine.
When the python executable is invoked with a file containing some valid source code content, first
Interpreter and Thread States 85
the interpreter and thread states are initialised and this is followed by the compilation of the source
file into a code object. The code object is then passed to the interpreter loop module where in order
to execute the code object, a frame object is created and attached to the main thread of execution.
So we have a python process that may contain one or more interpreter states and each interpreter
state may have one or more thread states and each thread states has references a frame that may
reference a frame and so on forming a stack of frames. Figure 7.2 provides a graphical representation
of this order.
In the next chapter, we show how all the parts that we have described enable the execution of a
python code object.
8. Intermezzo: The abstract.c Module
We have thus far mentioned multiple times that the python virtual machine generically treat values
for evaluation as PyObjects. This leaves the obvious question - How are operations safely carried
out on such generic objects ?. For example, when evaluating the bytecode instruction BINARY_ADD,
two PyObject values are popped from the evaluation stack and used as argument to an addition
operation but how does the virtual machine know if the values actually implement a protocol of
which the add operation is part of ?
To understand how alot of the operations on PyObjects work, we only have to look at the
Objects/Abstract.c module. This module defines a number of functions that work on objects that
implement a given object protocol. This means that for example, if one was adding two objects then
the add function in this module would expect that both objects implement the __add__ method of
the tp_numbers slots. The best way to explain this is to illustrate with an example.
Consider the case of the BINARY_ADD opcode, when it is applied to the addition of two numbers,
the PyNumber_Add function of the Objects/Abstract.c module is invoked. The definition of this
function is provided in listing 8.1.
Our interest at this point is in line 2 of the PyNumber_Add function from listing 8.1 - the call to
the binary_op1 function. The binary_op1 function is another generic function that takes among its
parameters, two values that are numbers or subclass of numbers and applies a binary function to
those two values; the NB_SLOT macro returns the offset of a given method into the PyNumberMethods
structure; recall that this structure is a collection of methods that work on numbers. The definition
of this generic binary_op1 function is included in listing 8.2 and an in-depth explanation of this
function immediately follows.
Intermezzo: The abstract.c Module 87
1. The function take three values, two PyObject * - v and w and an integer value, operation slot,
which is the offset of that operation into the PyNumberMethods structure.
2. Lines 3 and 4 define two values slotv and slotw that are structures that represent a binary
function as their types suggest.
Intermezzo: The abstract.c Module 88
3. From line 3 to line 13, we attempt to dereference the function given by op_slot argument
for both v and w. On line 8, there is a check for if both values are of the same type as there is no
need to dereference the second value’s function in the op_slot if both values are of the same
type. Even if both values are are not of the same type but the functions that were dereferenced
from both are equal then the slotw value is nulled out.
4. With the binary functions dereferenced, if slotv is not NULL then on line 15 we check that
slotw is not NULL and the type of w is a subtype of the type of v and if that is true, the slotw
function is applied to both v and w. This happens because if you pause to think about it for a
second, the method further down the inheritance tree is what we want to use not one further
up. If w is not a subtype then slotv is applied to both values at line 22.
5. Getting to line 27 means that the slotv function is NULL so we apply whatever slotw references
to both v and w so long as it is not NULL.
6. In the case where both slotv and slotw both do not contain a function then a Py_NotImple-
mented is returned. Py_RETURN_NOTIMPLEMENTED is just a macro that increments the reference
count of the Py_NotImplemented value before returning it.
The idea captured by the explanation given above is a blueprint for how the virtual machine is
able to perform operations on values that are supplied to it. We have simplified things a bit here
by ignoring opcodes that can be overloaded - for example the + symbol maps to the BINARY_ADD
opcode and can be applied to a string, a number or a sequence but in our example above we have
only looked at it being applied to numbers and subclasses of numbers. It is not too difficult to
imagine how overloaded operations are handled. In the case of the BINARY_ADD if one looks at the
PyNumber_Add function, one can see that if the value returned from the binary_op1 call is Py_-
NotImplemented then the virtual machine will attempt to treat the values as sequences and try
to dereference the sequence concatenation method that is then applied to the both values if they
implement the sequence protocol. Taking a step back to the interpeter loop in ceval.c, when we
observe the case for the evaluation of the BINARY_ADD opcode, we see the following snippet.
Ignore lines 1 and 2 as we discuss them when we talk about the interpreter loop. What we see from
the rest of the snippet, is that when we encounter the BINARY_ADD, the first port of call is a check
that both values are strings in order to apply string concatenation to the values. The PyNumber_Add
function from Objects/Abstract.c is then applied to both values if they are not strings. Although
the code seems a bit messy with the string check done in Python/ceval.c and the number and
sequence checks done in Objects/Abstract.c, it is pretty clear what is happening when we have
an overloaded opcode.
This explanation provided above is the way most opcode operations are handled - check the type
of the values being evaluated then dereference the method as required and apply to the argument
values.
9. The evaluation loop, ceval.c
We have finally arrived at the gut of the virtual machine - it is here that the virtual machine iterates
over python bytecode instructions of a code object and executes such instructions. This is achieved
using an actual for loop that iterates over an opcode switching on each type in order to run the
desired execution code. The Python/ceval.c module, about 5411 lines long, implements most of
the functionality required - at the heart of this function is the PyEval_EvalFrameEx function, an
approximately 3000 line long function that contains the actual evaluation loop. It is this PyEval_-
EvalFrameEx function that is the main thrust of our focus in the chapter.
The Python/ceval.c module provides platform specific optimizations such as threaded gotos as well
as python virtual machine optimizations such as opcode prediction. In this write-up, we are more
concerned with the virtual machine processes and optimizations so we conveniently disregard any
platform specific optimizations or process introduced here so long as it does not take away from our
explanation of the evaluation loop. We go into more detail than usual here so as to provide a solid
explanation for how the heart of the virtual machine is structured and works. It is important to note
that the opcodes and their implementations are constantly in flux so this description here may be
inaccurate at a later time.
Before any execution of bytecode can take place, a number of housekeeping operations such as
creating and initializing frames, setting up variables and initializing the virtual machine variables
such as instruction pointers have to be carried out. We get our feet wet with these
operations first explaining the setup processes that have to take place before the evaluation begins.
test()
Recall that code objects are created for code blocks; these code blocks could either be functions,
modules etc. so for a module with the above content, we can safely make the assumption that we
are dealing with two code objects - one for the module and one for the function test defined within
the module.
After the generation of the code object for the module in listing 9.0, the generated code object is
executed via a chain of function calls from the Python/pythonrun.c module - run_mod -> PyE-
val_EvalCode->PyEval_EvalCodeEx->_PyEval_EvalCodeWithName->PyEval_EvalFrameEx. At this
moment, our interest is lies with the _PyEval_EvalCodeWithName function that has a signature
shown in listing 9.1. It is this function that handles the name setup that is required before
bytecode evaluation in PyEval_EvalFrameEx. However, by looking at the function signature for the
_PyEval_EvalCodeWithName as shown in listing 9.1, one is probably left asking how this is related
to executing a module object rather than an actual function.
To wrap one’s head around this, one must think more generally in terms of code blocks and code
objects not functions or modules. Code blocks can have any or none of those arguments specified in
the _PyEval_EvalCodeWithName function signature - a function just happens to be a more specific
type of code block which has most if not all those values supplied. This means that the case of
executing _PyEval_EvalCodeWithName for a module code object is not very interesting as most of
those arguments are without value. The interesting instance occurs when a python function call is
made via the CALL_FUNCTION opcode. This results in a call to the fast_function function also in the
Python/ceval.c module. This function extracts function arguments from the function object before
delegating to the _PyEval_EvalCodeWithName function to carry out all the sanity checks that are
The evaluation loop, ceval.c 92
needed - this is not the full story but we will look at the CALL_FUNCTION opcode in more detail in a
later section of this chapter.
The _PyEval_EvalCodeWithName is quite a big function so we do not include it here but most of the
setup process that it goes through is pretty straightforward. For example, recall we mentioned that
the fastlocals field of a frame object provides some optimization for the local namespace and that
non-positional function arguments are fully known only at runtime; this basically means that we
cannot populate this fastlocals data structure without careful error checking. It is therefore during
this setup by the _PyEval_EvalCodeWithName function that the array referenced by the fastlocals
field of a frame is populated with the full range of local values. The steps involved in the setup
process that the _PyEval_EvalCodeWithName goes through when called involves the steps shown in
listing 9.1.
1. **stack_pointer: refers to the next free slot in the value stack of the execution frame.
Figure 9.0: Stack pointer after a single value has been pushed onto the stack
1. *next_instr: refers to the next instruction to be executed by the evaluation loop. One can
think of this as the program counter for the virtual machine. Python 3.6 changes the type of this
value to an unsigned short which is 2 bytes in size to handle the new bytecode instruction
size.
2. opcode: refers to the currently executing python opcode or the opcode that is about to be
executed.
3. oparg: refers to the argument of the currently executing opcode or opcode that is about to be
executed if it takes an argument.
4. why: The evaluation loop is an infinite loop implemented by the infinite for loop - for(;;) so
the loop needs a mechanism to break out of the loop and specify why the break occured. This
value refers to the reason for an exit from the evaluation loop. For example if the code block
exited the loop due to a return statement then this value will contain a WHY_RETURN status.
5. fastlocals: refers to an array of locally defined names.
6. freevars: refers to a list of names that are used within a code block but not defined in that
code block.
7. retval: refers to the return value from executing the code block.
8. co: References the code object that contains the bytecode that will be executed by the
evaluation loop.
9. names: references the names of all values in the code block of the executing frame.
10. consts: references the constants used by the code objects.
Bytecode instruction
We have discussed the format of bytecode instructions in the chapter on code objects but it is very
relevant to our discussion here so we repeat our description of the format of bytecode instructions
here.
Assuming we are working with python 3.6 bytecodes, all bytecodes are 16 bit long. The Python
VM uses a little endian byte encoding on the machine which I am currently typing out this book
thus the 16 bits of code are structured as shown in the following image with the opcode taking up
1 byte and the argument to the opcode taking up the second byte.
The evaluation loop, ceval.c 94
Extracting the opcodes and arguments involves some bit manipulation as we will see in the
following sections. It is important to note that since the the opcode is now two bytes and not
one, then the manipulation of instruction pointers subscribes to pointer manipulationa .
a
https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/pointer.html
The following macros play a very important role in the evaluation loop.
1. TARGET(op): expands to the case op statement. This matches the current opcode with the
block of code that implements the opcode.
2. DISPATCH: expands to continue. This together with the next macro - FAST_DISPATCH, handle
the flow of control of the evaluation loop after an opcode is executed.
3. FAST_DISPATCH: expands to a jump to the fast_next_opcode label within the evaluation for
loop.
With the introduction of the standard 2 bytes opcode in Python 3.6, the following set of macros are
used to handle code access.
1. INSTR_OFFSET(): This macros provides the byte offset of the current instruction into the array
of instructions. This expands to (2*(int)(next_instr - first_instr)).
2. NEXTOPARG(): This updates the opcode and oparg variable to the value of the opcode and
argument of the next bytecode instruction to be executed. This macro expands to the following
snippet
The OPCODE and OPARG macros handle the bit manipulation for extracting opcode and arguments.
Figure 9.0 shows the structure of a bytecode instruction with the argument to the opcode taking
lower eight bits and the opcode itself taking the upper eight bits hence OPCODE expands to ((word)
& 255) thus extracting the most significant byte from the bytecode instruction while OPARG which
expands to ((word) >> 8) extracts the least significant byte.
The evaluation loop, ceval.c 95
The last two macros defined above handle opcode prediction. When the evaluation loop encounters
a PREDICT(op) macro, the interpreter assumes that the next instruction to be executed is op. The
macros checks that this is indeed valid and if valid fetches the actual opcode and argument then
jumps to the label PRED_##op where the ## is a placeholder for the actual opcode. For example, if
we had encountered a prediction such as PREDICT(LOAD_CONST) then the goto statement argument
would be PRED_LOAD_CONSTop if that prediction was valid. An inspection of the source code for the
PyEval_EvalFrameEx function finds the PREDICTED(LOAD_CONST) label that expands to PRED_LOAD_-
CONSTop so on a successful prediction of this instruction, there is a jump to this label other wise
normal execution continues. This prediction saves the cost involved with the extra traversal of the
switch statement that would otherwise happen with normal code execution.
The next set of macros that we are interested in are the stack manipulation macros that handle
placing and fetching of values from the value stack of a frame object. These macros are pretty
similar and a few examples are shown in the following snippet.
1. STACK_LEVEL(): This returns the number of items on the stack. The macro expands to
((int)(stack_pointer - f->f_valuestack)).
2. TOP(): The returns the last item on the stack. This expands to (stack_pointer[-1]).
3. SECOND(): This returns the penultimate item on the stack. This expands to (stack_pointer[-
2]).
4. BASIC_PUSH(v): This places the item, v, on the stack. It expands to (*stack_pointer++ =
(v)). A current alias for this macro is the PUSH(v).
The evaluation loop, ceval.c 96
5. BASIC_POP(): This removes and returns an item from the stack. This expands to (*--stack_-
pointer). A current alias for this is the POP() macro.
The last set of macros of concern to us are those that handle local variable manipulation. These
macros, GETLOCAL and SETLOCAL are used to get and set values in the fastlocals array.
1. GETLOCAL(i): This expands to (fastlocals[i]). This handles the fetching of locally defined
names from the local array.
2. SETLOCAL(i, value): This expands to the snippet in listing 9.5. This macro sets the ith
element of the local array to the supplied value.
The UNWIND_BLOCK and UNWIND_EXCEPT_HANDLER are related to exception handling and we look at
them in subsequent sections.
def hello_world():
print("hello world")
A disassembly of the function from listing 9.7 is shown in listing 9.6 and we show how this set of
bytecode loops through the evaluation switch.
The evaluation loop, ceval.c 97
LOAD_GLOBAL 0 (0)
LOAD_CONST 1 (1)
CALL_FUNCTION 1 (1 positional, 0 keyword pair)
POP_TOP
LOAD_CONST 0 (0)
RETURN_VALUE
Figure 9.1 shows the evaluation path for the LOAD_GLOBAL and LOAD_CONST instructions. The second
and third blocks in both images of figure 9.2 represent housekeeping tasks that are carried out on
every iteration of the evaluation loop. TheGIL and signal handling checks were discussed in the
previous chapter on interpreter and thread states - it is during these checks that a thread executing
may give up control of the GIL for another thread to execute. The fast_next_opcode is a code label
just after the GIL and signal handling code that exist to serve as a jump destination when the loop
wishes to skip the previous checks as we will see when we look at the LOAD_CONST instruction.
The first instruction - LOAD_GLOBAL is evaluated by the LOAD_GLOBAL case statement of the switch
statement. The implementation of this opcode like other opcodes is a series of C statements and
function calls that is surprisingly involved as shown in listing 9.8. The implementation of the opcode
loads the value identified by the given name from the global or builtin namespace onto the evaluation
stack. The oparg is the index into the tuple which contains all names used within the code block -
co_names.
The evaluation loop, ceval.c 98
/* namespace 2: builtins */
v = PyObject_GetItem(f->f_builtins, name);
if (v == NULL) {
if (PyErr_ExceptionMatches(PyExc_KeyError))
format_exc_check_arg(
PyExc_NameError,
NAME_ERROR_MSG, name);
goto error;
}
}
}
The look-up algorithm for the LOAD_GLOBAL opcode first attempts to load the name from the
f_globals and f_builtins fields if they are dict objects otherwise it attempts to fetch the value
The evaluation loop, ceval.c 99
associatd with the name from the f_globals or f_builtins object with the assumption that they
implement some protocol for fetching value associated with a given name. If this value is found, it
is loaded on to the evaluation stack using the PUSH(v) otherwise an error is set and there is a jump
to the error code label for error handling. As the flow chart shows, after this value is pushed onto
the evaluation stack, the DISPATCH() macro is called which is basically an alias for the continue
statement.
The second diagram labelled 2 in the figure 9.1 shows the execution of the LOAD_CONST. Listing 9.9
is an the implementation of the LOAD_CONST opcode.
This goes through the normal setup as LOAD_GLOBAL but after execution, FAST_DISPATCH() is called
rather than DISPATCH(). This causes a jump to the fast_next_opcode code label from where the
loop execution continues skipping the signal and GIL checks on the next iteration. Opcodes that
have implemenations that make C function calls make of the DISPATCH macro while opcodes like
the LOAD_GLOBAL that do not make C function calls in their implementation make use of the FAST_-
DISPATCH macro. This means that the GIL can only be given up after executing opcodes that make
C function calls.
The next opcode that is executed is the CALL_FUNCTION opcode as shown in first image from figure
9.2. This opcode is emitted by the compiler when a function call is made with only positional
The evaluation loop, ceval.c 100
arguments used in the call. The implemenation of this opcode is shown in listing 9.10. At the heart
of the opcode implementation is the call_function(&sp, oparg, NULL). oparg is the number of
arguments passed to the function and the call_function function reads that number of values
from the evaluation stack.
The next instruction shown in diagram 4 of figure 9.2 is the POP_TOP instruction that removes a
single value from the top of the evaluation stack - this clears any value placed on the stack by the
previous function call.
The next set of instructions are the LOAD_CONST and RETURN_VALUE pair shown in diagrams 5 and 6 of
figure 9.3. The LOAD_CONST opcode loads a None value onto the evaluation stack for the RETURN_VALUE
to work with. These tow always go together when a python function does not explicitly return any
value. We have already looked at the mechanics of the LOAD_CONST instruction. The RETURN_VALUE
instruction pops the top of the stack into the retval variable, set the WHY status code to WHY_RETURN
and then performs a jump to the fast_block_end code label. The execution continues from there
breaking out of the for loop and then subsequently returning the value of the retval variable to
the calling function.
Notice that a lot of the code snippets thatwe have looked at have the goto error jump but we have
intentionally discussing errors and exceptions out so far. We will look at exception handling in the
next chapter. Although the function looked at in this section is rather trivial, it encapsulates the
main behaviour of the evaluation loop while executing bytecode instructions. Any other opcode
may have an implementation that is a bit more complicated but the essence of the execution is the
same as described above.
Next we look at some other interesting opcodes supported by the python virtual machine.
1. MAKE_FUNCTION: As the name suggest the opcode creates a function object from values on the
the evaluation stack. Consider a module containing the functions shown in listing 9.11.
Listing 9.11: Function definitions in a module
def test_non_local(arg, *args, defarg="test", defkwd=2, **kwd):
local_arg = 2
print(arg)
print(defarg)
print(args)
print(defkwd)
print(kwd)
def hello_world():
print("Hello world!")
A disassembly of the code object formt he compilation of the module gives the set of bytecode
instructions shown in listing 9.12
The evaluation loop, ceval.c 102
We can see that the MAKE_FUNCTION opcode appears twice in the series of bytecode instructions
- one for each function definition within the module. The implementation of the MAKE_FUNC-
TION creates a function object and then stores the function in the local namespace using
the function definition name. Notice that default arguments are pushed on the stack when
such arguments are defined. The MAKE_FUNCTION implementation consumes these values by
and’ing the oparg with a bitmask and popping values from the stack accordingly.
Listing 9.12: MAKE_FUNCTION opcode implementation
TARGET(MAKE_FUNCTION) {
PyObject *qualname = POP();
PyObject *codeobj = POP();
PyFunctionObject *func = (PyFunctionObject *)
PyFunction_NewWithQualName(codeobj, f->f_globals, qualname);
Py_DECREF(codeobj);
Py_DECREF(qualname);
if (func == NULL) {
goto error;
}
}
if (oparg & 0x04) {
assert(PyDict_CheckExact(TOP()));
func->func_annotations = POP();
}
if (oparg & 0x02) {
assert(PyDict_CheckExact(TOP()));
func->func_kwdefaults = POP();
}
if (oparg & 0x01) {
assert(PyTuple_CheckExact(TOP()));
func->func_defaults = POP();
}
PUSH((PyObject *)func);
DISPATCH();
}
The LOAD_ATTR opcode implementation is pretty simple and shown in listing 9.14.
The evaluation loop, ceval.c 104
The PyObject_GetAttr function is one we have looked at in the chapter on objects. This
function de-references whatever value is in the tp_getattro attribute of the object and uses
that to get load the value of the attribute on to the top of the value stack. One can review the
chapter on objects to get a better understanding of how this works.
3. CALL_FUNCTION_KW: This opcode very similar in functionality to the CALL_FUNCTION opcode
that was discussed previously but is used for function calls with keyword arguments. The
implementation for this opcode is shown in listin 9.15. Notice how one of the major change
from the implementation of the CALL_FUNCTION opcode is that a tuple of names is now passed
as one of the arguments when call_function is invoked.
Listing 9.15: CALL_FUNCTION_KW opcode implementation
PyObject **sp, *res, *names;
names = POP();
assert(PyTuple_CheckExact(names) && PyTuple_GET_SIZE(names) <= oparg);
PCALL(PCALL_ALL);
sp = stack_pointer;
res = call_function(&sp, oparg, names);
stack_pointer = sp;
PUSH(res);
Py_DECREF(names);
if (res == NULL) {
goto error;
}
The names are the keyword arguments of the function call and they are used in the _PyEval_-
EvalCodeWithName to handle the setup before the code object for the function is executed.
The evaluation loop, ceval.c 105
This caps our explanation of the evaluation loop. As we have seen, the concept behind the evaluation
loop are not terribly complicated - opcodes each have implementations that are defined in C and
these implemenations are the actual do work functions. One very important area that we have not
touched upon is exception handling and the block stack, two intimately connected concepts that we
look at in the following chapter.
10. The Block Stack
One of the data structures that does not get as much coverage as it should is the block stack -
the other stack within a frame object. Most discussions of the Python VM just mention the block
stack passingly but then focus on the evaluation stack. However, the block stack is pretty important
- there are probably other ways to implement exception handling but as we will see while we
progress through this chapter, using a stack, the block stack, makes it incredibly simple to implement
exception handling. The block stack and exception handling are so intertwined that one will not
fully understand the need for the block stack without actually taking exception handling into
consideration. The block stack is also used for loops but it is difficult to see a reason for block stacks
with loops until one looks at how loop constructs like break interact with exception handlers so lets
get straight down to the details. The block stack makes the implementation of such interactions a
straightforward affair.
The block stack is a stack data structure field within a frame object. Just like the evaluation stack of
the frame, values are pushed to and popped from the block stack during the execution of a frame’s
code. However the block stack is used only for handling loops and exceptions. The best way to
explain the block stack is with an example so we illustrate with a simple try...finally construct
within a loop as shown in the snippet in listing 10.0.
def test():
for i in range(4):
try:
break
finally:
print("Exiting loop")
When the function from listing 10.0 is disassembled, the result is shown in listing 10.1.
The Block Stack 107
4 16 BREAK_LOOP
18 POP_BLOCK
20 LOAD_CONST 0 (None)
For a simple function, listing 10.1 is a lot of opcodes but this is due to the combination of a for loop
and try .. finally constructs. The opcodes of interest here are the SETUP_LOOP and SETUP_FINALLY
opcodes so we look at the implementations of these to get the gist of what it does (all SETUP_* opcodes
map to the same implementation).
The implementation for the SETUP_LOOP opcode is a simple function call - PyFrame_BlockSetup(f,
opcode, INSTR_OFFSET() + oparg, STACK_LEVEL());. The arguments are pretty self explanatory -
f is the frame, opcode is the currently executing opcode, INSTR_OFFSET() + oparg is the instruction
delta for the next instruction after that block (for the above code the delta is 50 for the SETUP_LOOP)
and the STACK_LEVEL denotes how many items are on the value stack of the frame. The function
call creates a new block and push it on the block stack. The information contained in this block is
enough for the virtual machine to continue execution should something happen while in that block.
The implementation of this function is shown in listing 10.2.
The Block Stack 108
The handler in listing 10.2 is the pointer to the next instruction that should be executed after the
SETUP_* block. It is best to illustrate the example from above with a graphical representation of the
execution process and figure 10.0 illustrates this with a portion of the bytecode.
The Block Stack 109
Figure 10.0: How the block stack changes with SETUP_* instructions
The Block Stack 110
Figure 10.0 shows how the block stack varies as each instruction is executed.
In the first diagram of the figure 10.0, the SETUP_LOOP opcode is executed and a single SETUP_LOOP
block is placed on the block stack. The handler for this block is the instruction at offset 36 so when
this stack is popped under normal execution, the interpreter will jump to that offset and continue
execution from there. When the SETUP_FINALLY opcode is encountered, another block is pushed onto
the block stack. We can see that as the stack is Last In First Out data structure, the finally block will
be the first out - recall that finally must be executed regardless of the break statement.
The real place where the use of a block stack shines is when the break statement is encountered
while inside the exception handler within the loop. When the BREAK_LOOP opcode is executed, the
why variable is set to WHY_BREAK and a jump is executed to the fast_block_end code label as shown in
the second diagram of figure 10.0 where block stack unwinding is handled. Unwinding is just a fancy
name for popping blocks on the stack and executing their handler. So in this case, the SETUP_FINALLY
block is popped of the stack and the interpreter jumps to its handler at bytecode offset 22. Normal
execution continues from that offset till the END_FINALLY statement is encountered. Since the why
code is a WHY_BREAK, a jump is executed to the fast_block_end code label once again where more
stack unwinding happens - the loop block is left on the stack. This time around (not shown in figure
10.0), the block popped from the stack has a handler at byte offset 36 so the execution continues at
that bytecode offset thus exiting the loop and continuing with normal execution.
The use of the block stack greatly simplifies the implementation of the virtual machine implemen-
tation. If the loops were not implemented with a block stack, an opcode such as the BREAK_LOOP will
need a jump destination. If we then throw in a try..finally construct with that break statement
we get a complex implementation where we would have to keep track of optional jump destinations
within finally blocks and so on.
def test1():
try:
2 + 's'
except Exception:
print("Caught exception")
The opcodes generated for the simple function in listing 10.3 are shown in listion 10.4.
The Block Stack 111
3 2 LOAD_CONST 1 (2)
4 LOAD_CONST 2 ('s')
6 BINARY_ADD
8 POP_TOP
10 POP_BLOCK
12 JUMP_FORWARD 28 (to 42)
4 >> 14 DUP_TOP
16 LOAD_GLOBAL 0 (Exception)
18 COMPARE_OP 10 (exception match)
20 POP_JUMP_IF_FALSE 40
22 POP_TOP
24 POP_TOP
26 POP_TOP
5 28 LOAD_GLOBAL 1 (print)
30 LOAD_CONST 3 ('Caught exception')
32 CALL_FUNCTION 1
34 POP_TOP
36 POP_EXCEPT
38 JUMP_FORWARD 2 (to 42)
>> 40 END_FINALLY
>> 42 LOAD_CONST 0 (None)
44 RETURN_VALUE
We should have a conceptual idea of how this code block will execute if an exception should
occur given the previous explanation. In summary, we expect the PyNumber_Add function from the
Objects/abstract.c module to return a NULL for the BINARY_ADD opcode. Buried in the detail of that
is the fact that in addition to returning a NULL value, the functions there also set exception values
on the thread state data structure of the currently executing thread. Recall, that the thread state has
the curexc_type, curexc_value and curexc_traceback fields for holding the current exception in
the thread of execution; these fields prove very useful while unwinding the block stack in search of
exception handlers. You can follow the chain of function calls from the binop_type_error function
in the Objects/abstract.c module all the way to the PyErr_Restore function within the same
module where the values are set on the currently executing thread.
With the exception values set on the currently executing thread and a NULL value returned from
the function call, the interpreter loop executes a jump to the error label where all the magic of
exception handling occurs or not. For our trivial example above, we have only one block on the
The Block Stack 112
block stack, the SETUP_EXCEPT block with a handler at bytecode offset 14. Once the jump to error
handler label occurs, the stack unwinding can begin. The exception values - traceback, exception
value and exception type are pushed on top of the value stack, the SETUP_EXCEPT handler is popped
from the block stack and then there is a jump to the handler - byte offset 14 in this case from
where execution continues. Now observe the bytecodes from bytecode offset 16 to bytecode offset
20 in listing 10.4 - the Exception class is loaded onto the stack and then this is compared with the
exception that was raised and is present on the stack. If the exceptions match then normal execution
can continue with the popping of exception and tracebacks from value stack and execution of any of
the error handler code. When there is no exception match, the END_FINALLY instruction is executed
and since there is still an exception on the stack there is a break from the exception loop.
In the case where there is no exception handling mechanism, the opcodes for the test function are
more straightforward as shown in listing 10.5.
Listing 10.5: Disassembly of function in listing 10.3 when there is no exception handling
2 0 LOAD_CONST 1 (2)
2 LOAD_CONST 2 ('s')
4 BINARY_ADD
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
The opcodes do not put anything on the block stack so when an exception happens and there is a
jump to the error handling label, there is no block to be unwound from the stack causing the loop
to be exit and the error to be dumped.
This doesn’t capture the whole inner workings of the exception handling mechanism but it provides
a coverage of the fundamentals of the interaction between the block stack and error handling in the
python virtual machine. There remains other details such as the case of an exception being thrown
while handling an exception, the case of nested exception handlers and nested exceptions and so on.
However, we conclude this short intermezzo at this point.
11. From Class code to bytecode
We have covered a lot of ground discussing the nuts and bolts of how the python virtual machine or
interpreter (whichever you want to call it) executes your code but for an object oriented language
like Python, we have actually left out one of the most important parts - the nuts and bolts of how a
user defined class gets compiled down to bytecode and executed.
From our discussion on Python objects, we have a rough idea of how new classes may be created
but that intuition may not totally capture the whole process from the class ... definition of a user
to actual bytecode that creates new class objects so this chapter aims to bridge that gap and provide
an exposition on how this process occurs.
As usual we start with a very simple user defined class module as shown in listing lisitng 11.0.
class Person:
When a module containing the above class definition is disassembled with the dis module, the
stream of bytecode shown in listing 11.1 is output.
0 LOAD_BUILD_CLASS
2 LOAD_CONST 0 (<code object Person at 0x102298b70, file \
"string", line 2>)
4 LOAD_CONST 1 ('Person')
6 MAKE_FUNCTION 0
8 LOAD_CONST 1 ('Person')
10 CALL_FUNCTION 2
12 STORE_NAME 0 (Person)
14 LOAD_CONST 2 (None)
16 RETURN_VALUE
We are interested in bytes 0 to bytes 12 as these are the actual opcodes that create the new class object
and store it so that it can be referenced by its name (Person in our example). Before, we expand on the
From Class code to bytecode 114
opcodes above, we look at the process of class creation as specified by the Python documentation1 .
The description of the process in the documentation though done at a very high level is pretty clear.
It can be surmised from the python documentation2 , the behind the scenes process of class creation
involves roughly the following processes in no particular order.
During the final step, the class object is created by instantiating the type class, passing in the
class name, base classes and class dictionary as arguments. Any __prepare__ hooks are run prior
to instantiating the class object. The metaclass used in the class object creation can be explicitly
specified by supplying the metaclass keyword argument in the class definition. In the case that this
is not supplied, the class statement examines the first entry in the tuple of the the base classes if any. If
no base classes are used, the global variable __metaclass__ is searched for and if no value is found
for this, Python uses the default meta-class. More about meta-classes is discussed in subsequent
chapters.
The whole class creation process starts with the loading of the __build_class function onto the
value stack. This is the function responsible for all the heavy lifting in creating new types/classes.
Follwoign this, we have step 1 of the creation process mentioned above which is done during the
compilation phase; during this step, code object representing the body of the class is loaded on the
stack by the instruction at offset 2 - LOAD_CONST. This code object is wrapped into a function object
by the MAKE_FUNCTION opcode and it will soon become clear why this happens. By the time, the
execution loop gets to offset 10, the evaluation stack looks similar to that in Figure 11.0.
1
https://docs.python.org/3.6/reference/datamodel.html#customizing-class-creation
2
https://docs.python.org/3.6/reference/datamodel.html#customizing-class-creation
From Class code to bytecode 115
At offset 10, CALL_FUNCTION handles invokes the __build_class function with the values above it on
the evaluation stack as argument. This function is defined in the Python/bltinmodule.c module. A
significant part of the function is devoted to sanity checks - check that right arguments are supplied,
that they have the correct type etc. After these sanity checks, the function then has to decide on the
right metaclass. We state the rules for determining the correctly metaclass verbatim as described in
the python documentation3 .
The most derived metaclass is selected from the explicitly specified metaclass (if any)
and the metaclasses (i.e. type(cls)) of all specified base classes. The most derived
metaclass is one which is a subtype of all of these candidate metaclasses. If none of
the candidate metaclasses meets that criterion, then the class definition will fail with
TypeError.
The actual snippet from the __build_class function that handles the metaclass resolution is shown
in listing 11.2 and it has been annotated a bit more to provide some more clarity.
3
https://docs.python.org/3.5/reference/datamodel.html#determining-the-appropriate-metaclass
From Class code to bytecode 116
...
/* kwds are values passed into brackets that follow class name
e.g class(metalcass=blah)*/
if (kwds == NULL) {
meta = NULL;
mkw = NULL;
}
else {
mkw = PyDict_Copy(kwds); /* Don't modify kwds passed in! */
if (mkw == NULL) {
Py_DECREF(bases);
return NULL;
}
/* for all intent and purposes &PyId_metaclass references the string\
"metaclass"
but the &PyId_* macro handles static allocation of such strings */
With the metaclass found, __build_class then proceeds to check if any __prepare__ attribute exists
on the metaclass; if any such attribute exists the class namespace is prepared by executing the __-
prepare__ hook passing the class name, class bases and any additional keyword arguments from the
class definition. This hook can be used to customize class behaviour. The following example in listing
11.3 which is taken from the example on metaclass definition and use of the python documentation4
shows an example of how the __prepare__ hook can be used to implement a class with attribute
ordering.
class OrderedClass(type):
@classmethod
def __prepare__(metacls, name, bases, **kwds):
return collections.OrderedDict()
class A(metaclass=OrderedClass):
def one(self): pass
def two(self): pass
def three(self): pass
def four(self): pass
>>> A.members
('__module__', 'one', 'two', 'three', 'four')
The __build_class function returns an empty new dict if there is no __prepare__ attribute defined
on the metaclass but in the event that there is one, the namespace that is used is the result of executing
the __prepare__ attribute as the following snippet in listing 11.4.
4
https://docs.python.org/3.6/reference/datamodel.html#metaclass-example
From Class code to bytecode 118
...
// get the __prepare__ attribute
prep = _PyObject_GetAttrId(meta, &PyId___prepare__);
if (prep == NULL) {
if (PyErr_ExceptionMatches(PyExc_AttributeError)) {
PyErr_Clear();
ns = PyDict_New(); // namespace is a new dict if __prepare__ is \
not defined
}
else {
Py_DECREF(meta);
Py_XDECREF(mkw);
Py_DECREF(bases);
return NULL;
}
}
else {
/** where __prepare__ is defined, the namespace is the result of exe\
cuting
the __prepare__ attribute **/
PyObject *pargs[2] = {name, bases};
ns = _PyObject_FastCallDict(prep, pargs, 2, mkw);
Py_DECREF(prep);
}
if (ns == NULL) {
Py_DECREF(meta);
Py_XDECREF(mkw);
Py_DECREF(bases);
return NULL;
}
...
Afer handling the __prepare__ hook, it is now time for the actual class object to be created. First of
all, the code object for the body of the class is executed within the namespace created in the previous
paragraph. To understand why this is so, we disassemble the code object for the body of the class
defined at the start of this chapter in listing 11.5.
From Class code to bytecode 119
Listing 11.5: Disassembly of code object for class body from listing 11.0
1 0 LOAD_NAME 0 (__name__)
2 STORE_NAME 1 (__module__)
4 LOAD_CONST 0 ('test')
6 STORE_NAME 2 (__qualname__)
When this code object is executed, the namespace will contain all the attributes of the class i.e. class
attributes, methods etc. This namespace is then used in the next stage of the process as an argument
for a function call to the metaclass as shown in listing 11.6.
Assuming we are using the type metaclass, calling the type means dereferencing the attribute in the
tp_call slot of the class. The tp_call function then in turn dereferences the attribute in the tp_new
slot which actually creates and returns our brand new class object. The cls value returned is then
placed back on the stack and stored to the Person variable. There we have it, the process of creating
a new class and this is really all there is to it in Python.
12. Generators: Behind the scenes.
Generators are one of the really beautiful concepts in python. A generator function is a function that
contain a yield statement and when a generator function is invoked it returns a generator. A very
simple use of generators in python is as an iterator that produces values for an iteration on demand.
Listing 12.0 is a very simple example of a generator function that returns a generator that produces
values from 0 up to n.
def firstn(n):
num = 0
while num < n:
v = yield num
print(v)
num += 1
firstn is a generator fucntion so calling the firstn function with a value does not return a simple
value like a conventional function would do but rather it returns a generator object which captures
the continuation of the computation. We can then use the next function to get successive values
from the returned generator object or the send method of the generator object to send values into
the generator object.
In this chapter, we are not interested in the semantics of the generators objects or how the generators
are or should be used. Our interest lies with the nuts and bolts of how generators are implemented
under the covers in CPython. We are interested in how it is possible to suspend a computation and
then subsequently resume such computation. We look at the data structures and ideas behind this
concept and surprisingly they are not too complicated. First, we look at the C implementation of a
generator object.
typedef struct {
/* The gi_ prefix is intended to remind of generator-iterator. */
_PyGenObject_HEAD(gi)
} PyGenObject;
1. prefix##_frame: This field references a frame object. This frame object contains the code
object of a generator and it is within this frame that the execution of the generator object’s
code object takes place.
2. prefix##_running: This is a boolean field that indicates whether the generator is running.
3. prefix##_code: This field references the code object associated with the generator. This is the
code object that executes whenever the generator is running.
4. prefix##_name: This is the name of the generator - in listing 12.0, the value is firstn.
5. prefix##_qualname: This is the fully qualified name of the generator. Most times this value
is the same as that of prefix##_name.
Creating generators
When a generator function is called, the generator function does not run to completion and return
a value rather a generator object is returned. This is possible because of the CO_GENERATOR flag that
Generators: Behind the scenes. 122
is set during the compilation of a generator function and this flags comes in very useful during the
setup process that happens just before the code object execution.
During the execution of the code object for the function, recall the _PyEval_EvalCodeWithName is
invoked to perform some setup. As part of the setup process, a check of the CO_GENERATOR flag of
the function code object is performed and in the case where it is set, rather than call the evaluation
loop function, a generator object is created and returned to the caller. The magic happens at the last
code block of the _PyEval_EvalCodeWithName as shown in listing 12.2.
Listing 12.2: _PyEval_EvalCodeWithName returns a generator object when processing a code object with
generator flag
/* Handle generator/coroutine/asynchronous generator */
if (co->co_flags & (CO_GENERATOR | CO_COROUTINE | CO_ASYNC_GENERATOR)) {
PyObject *gen;
PyObject *coro_wrapper = tstate->coroutine_wrapper;
int is_coro = co->co_flags & CO_COROUTINE;
PCALL(PCALL_GENERATOR);
return gen;
}
What we can see from listing 12.2 is that bytecode for a generator function code object is never
executed when the generator function is called - the execution of bytecode only happens when the
returned generator object is executing and we look at this next.
To get a more indepth look at a generator function, we look at the generator function from listing
12.0. The disassembly of the generator function from listing 12.0 results in the set of bytecode shown
in listing 12.3.
4 0 LOAD_CONST 1 (0)
2 STORE_FAST 1 (num)
6 14 LOAD_FAST 1 (num)
16 YIELD_VALUE
18 STORE_FAST 2 (v)
7 20 LOAD_GLOBAL 0 (print)
22 LOAD_FAST 2 (v)
24 CALL_FUNCTION 1
26 POP_TOP
8 28 LOAD_FAST 1 (num)
30 LOAD_CONST 2 (1)
32 INPLACE_ADD
34 STORE_FAST 1 (num)
36 JUMP_ABSOLUTE 6
>> 38 POP_BLOCK
>> 40 LOAD_CONST 0 (None)
42 RETURN_VALUE
When the execution of the bytecode shown in listing 12.3 for the generator function gets to the
YIELD_VALUE opcode at byteoffset 16, that opcode causes the evaluation to suspend and return the
value on the top of the stack to the caller. By suspend, we mean the evaluation loop for the currently
executing frame is exited however this frame is not deallocated because it is still referenced by the
generator object so execution of the frame can continue again when PyEval_EvalFrameEx is invoked
with the frame as one of its arguments.
Python generators do more than just generate values, they can also consume values by using the
generator send method. This is possible because yield is an expression that evaluates to a value.
When the send method is called on a generator with a value, the gen_send_ex method places the
Generators: Behind the scenes. 125
value onto the evaluation stack of the generator object frame before calling evaluating the frame
object. From listing 12.3, the instruction that follows the YIELD_VALUE instruction is the STORE_FAST
that stores whatever value is on to the top of the stack to the name on the left side of the assignment.
In the case where there is no send function call then the value that is placed on the top of the stack
is None.