Cython Cise PDF
Cython Cise PDF
This article is published in IEEE Computing in Science and Engineering. Please refer to
the published version if accessible, as it contains editors improvements. (c) 2011 IEEE.
Permalink: http://dx.doi.org/10.1109/MCSE.2010.118
Abstract the need for many such loops, there are always
going to be computations that can only be ex-
Cython is an extension to the Python language pressed well through looping constructs. Cython
that allows explicit type declarations and is com- aim to be a good companion to NumPy for such
piled directly to C. This addresses Pythons large cases.
overhead for numerical loops and the difficulty of
efficiently making use of existing C and Fortran Given the magnitude of existing, well-tested code
code, which Cython code can interact with na- in Fortran and C, rewriting any of this code in
tively. The Cython language combines the speed Python would be a waste of our valuable re-
of C with the power and simplicity of the Python sources. A big part of the role of Python in sci-
language. ence is its ability to couple together existing com-
ponents instead of reinventing the wheel. For in-
stance, the Python-specific SciPy library contains
Introduction over 200 000 lines of C++, 60 000 lines of C, and
75 000 lines of Fortran, compared to about 70 000
Pythons success as a platform for scientific com-
lines of Python code. Wrapping of existing code
puting to date is primarily due to two factors.
for use from Python has traditionally been the
First, Python tends to be readable and concise,
domain of the Python experts, as the Python/C
leading to a rapid development cycle. Second,
API has a high learning curve. While one can use
Python provides access to its internals from C via
such wrappers without ever knowing about their
the Python/C API. This makes it possible to in-
internals, they draw a sharp line between users
terface with existing C, C++, and Fortran code,
(using Python) and developers (using C with the
as well as write critical sections in C when speed
Python/C API).
is essential.
Cython solves both of these problems, by com-
Though Python is plenty fast for many tasks, low-
piling Python code (with some extensions) di-
level computational code written in Python tends
rectly to C, which is then compiled and linked
to be slow, largely due to the extremely dynamic
against Python, ready to use from the interpreter.
nature of the Python language itself. In particu-
Through its use of C types, Cython makes it
lar, low-level computational loops are simply in-
possible to embed numerical loops, running at
feasible. Although NumPy [NumPy] eliminates
C speed, directly in Python code. Cython also
Personal use of this material is permitted. Permission significantly lowers the learning curve for calling
from IEEE must be obtained for all other users, includ- C, C++ and Fortran code from Python. Using
ing reprinting/ republishing this material for advertising
or promotional purposes, creating new collective works for
Cython, any programmer with knowledge of both
resale or redistribution to servers or lists, or reuse of any Python and C/C++/Fortran can easily use them
copyrighted components of this work in other works. together.
1
In this paper, we present an overview of the Finally, numexpr1 and Theano2 are specialized
Cython language and the Cython compiler in tools for quickly evaluating numerical expressions
several examples. We give guidelines on where (see below).
Cython can be expected to provide signifi-
To summarize, Cython could be described as a
cantly higher performance than pure Python and
swiss army knife: It lacks the targeted function-
NumPy code, and where NumPy is a good choice
ality of more specialized tools, but its generality
in its own right. We further show how the Cython
and versatility allow its application in almost any
compiler speeds up Python code, and how it can
situation that requires going beyond pure Python
be used to interact directly with C code. We also
code.
cover Fwrap, a close relative of Cython. Fwrap
is used for automatically creating fast wrappers
around Fortran code to make it callable from C, Cython at a glance
Cython, and Python.
Cython is a programming language based on
Cython is based on Pyrex [Pyrex] by Greg Ew- Python, that is translated into C/C++ code, and
ing. Its been one of the more friendly forks in finally compiled into binary extension modules
open source, and we are thankful for Gregs co- that can be loaded into a regular Python ses-
operation. The two projects have somewhat dif- sion. Cython extends the Python language with
ferent goals. Pyrex aims to be a smooth blend explicit type declarations of native C types. One
of Python and C, while Cython focuses more can annotate attributes and function calls to be
on preserving Python semantics where it can. resolved at compile-time (as opposed to runtime).
Cython also contains some features for numeri- With the extra information from the annotations,
cal computation that are not found in Pyrex (in Cython is able to generate code that sidesteps
particular fast NumPy array access). While there most of the usual runtime costs.
is a subset of syntax that will work both in Pyrex
and Cython, the languages are diverging and one The generated code can take advantage of all the
will in general have to choose one or the other. optimizations the C/C++ compiler is aware of
For instance, the syntax for calling C++ code is without having to re-implement them as part of
different in Pyrex and Cython, since this feature Cython. Cython integrates the C language and
was added long after the fork. the Python runtime through automatic conver-
sions between Python types and C types, allowing
There are other projects that make possible the the programmer to switch between the two with-
inclusion of compiled code in Python (e.g. Weave out having to do anything by hand. The same
and Instant). A comparison of several such tools applies when calling into external libraries writ-
can be found in [comparison]. Another often used ten in C, C++ or Fortran. Accessing them is a
approach is to implement the core algorithm in native operation in Cython code, so calling back
C, C++ or Fortran and then create wrappers and forth between Python code, Cython code and
for Python. Such wrappers can be created with native library code is trivial.
Cython or with more specialized tools such as
SWIG, ctypes, Boost.Python or F2PY. Each tool Of course, if were manually annotating every
has its own flavor. SWIG is able to automatically variable, attribute, and return type with type in-
wrap C or C++ code while Cython and ctypes formation, we might as well be writing C/C++
require redeclaration of the functions to wrap. directly. Here is where Cythons approach of ex-
SWIG and Cython require a compilation stage tending the Python language really shines. Any-
which ctypes does not. On the other hand, if one thing that Cython cant determine statically is
gets a declaration wrong using ctypes it can result compiled with the usual Python semantics, mean-
in unpredictable program crashes without prior ing that you can selectively speed up only those
warning. With Boost.Python one implements a parts of your program that expose significant ex-
Python module in C++ which depending on ecution times. The key thing to keep in mind in
who you ask is either a great feature or a great this context is the Pareto Principle, also known
disadvantage. 1
http://code.google.com/p/numexpr/
2
http://deeplearning.net/software/theano/
3
value range and the type of the loop variable al- are easy to add. It therefore becomes reasonable
low it. Similarly, when iterating over a sequence, for code writers to stick to the simple and read-
it is sometimes required to know the current in- able idioms of the Python language, to rely on the
dex inside of the loop body. Python has a spe- compiler to transform them into well specialized
cial function for this, called enumerate(), which and fast C language constructs, and to only take
wraps the iterable in a counter: a closer look at the code sections, if any, that still
prove to be performance critical in benchmarks.
f = open(a_file.txt)
for line_no, line in enumerate(f): Apart from its powerful control flow constructs,
# prepend line number to line a high-level language feature that makes Python
print("%d: %s" % (line_no, line)) so productive is its support for object oriented
programming. True to the rest of the language,
Cython knows this pattern, too, and reduces the Python classes are very dynamic methods and
wrapping of the iterable to a simple counter vari- attributes can be added, inspected, and modi-
able, so that the loop can run over the iterable fied at runtime, and new types can be dynami-
itself, with no additional overhead. Cythons for cally created on the fly. Of course this flexibil-
loop has optimizations for the most important ity comes with a performance cost. Cython al-
built-in Python container and string types and it lows one to statically compile classes into C-level
can even iterate directly over low-level types, such struct layouts (with virtual function tables) in
as C arrays of a known size or sliced pointers: such a way that they integrate seamlessly into the
cdef char* c_string = \ Python class hierarchy without any of the Python
get_pointer_to_chars(10) overhead. Though much scientific data fits nicely
cdef char char_val into arrays, sometimes it does not, and Cythons
support for compiled classes allows one to effi-
# check if chars at offsets 3..9 are ciently create and manipulate more complicated
# any of abcABC data structures like trees, graphs, maps, and other
for char_val in c_string[3:10]:
heterogeneous, hierarchal objects.
print( char_val in babcABC )
Consider computing a simple expression for a In the case of numerical optimization or equa-
large number of different input values, e.g.: tion solving, the algorithm in question must be
handed a function (a callback) which evaluates
v = np.sqrt(x**2 + y**2 + z**2) the function. The algorithm then relies on mak-
ing new steps depending on previously computed
where the variables are arrays for three vectors function values, and the process is thus inherently
x, y and z. This is a case where, in most cases, sequential. Depending on the nature and size of
one does not need to use Cython it is easily the problem, different levels of optimization can
expressed by pure NumPy operations that are al- be employed.
ready optimized and usually fast enough. For medium-sized to large problems, the standard
The exceptions are for either extremely small or scientific Python routines integrate well with with
large amounts of data. For small data sets that Cython. One simply declares types within the
are evaluated many, many times, the Python over- callback function, and hands the callback to the
head of the NumPy expression will dominate, solver just like one would with a pure Python
and making a loop in Cython removes this over- function. Given the frequency with which this
head. For large amounts of data, NumPy has two function may be called, the act of typing the
problems: it requires large amounts of temporary variables in the callback function, combined with
memory, and it repeatedly moves temporary re- the reduced call overhead of Cython implemented
sults over the memory bus. In most scientific set- Python functions, can have a noticeable impact
tings the memory bus can easily become the main on performance. How much depends heavily on
bottleneck, not the CPU (for a detailed explana- the problem in question; as a rough indicator, we
tion see [Alted]). In the example above, NumPy have noted a 40 times speedup when using this
will first square x in a temporary buffer, then method on a particular ordinary differential equa-
square y in another temporary buffer, then add tion in 12 variables.
them together using a third temporary buffer, and For computationally simple problems in only a
so on. few variables, evaluating the function can be such
In Cython, it is possible to manually write a loop a quick operation that the overhead of the Python
running at native speed: function call for each step becomes relevant. In
7
Here we create a computed which is equivalent is inherently limited in its multithreading capa-
to the matrix product u * s diag * vt, and we bilities, due to the use of a Global Interpreter
verify that a and a computed are equal to within Lock (GIL). Cython code can declare sections as
machine precision. only containing C code (using a nogil directive),
which are then able to run in parallel. How-
When calling the routine from within Cython
ever, this can quickly become tedious. Currently
code, the invocation is identical, and the argu-
theres also no support for OpenMP programming
ments can be typed to reduce function call over-
in Cython. On the other hand, message passing
head. Again, please see the documentation for
parallelism using multiple processes, for instance
details and examples.
through MPI, is very well supported.
Fwrap handles any kind of Fortran array declara-
Compared to C++, a major weakness is the lack
tion, whether assumed-size (like the above exam-
of built-in template support, which aids in writ-
ple), assumed-shape or explicit shape. Options
ing code that works efficiently with many differ-
exist for hiding redundant arguments (like the ar-
ent data types. In Cython, one must either repeat
ray dimensions LDA, LDU and LDVT above) and are
code for each data type, or use an external tem-
covered in Fwraps documentation.
plating system, in the same way that is often done
This example covers just the basics of for Fortran codes. Many template engines exists
what Fwrap can do. For more informa- for Python, and most of them should work well
tion, downloads and help using Fwrap, see for generating Cython code.
http://fwrap.sourceforge.net/. You can
Using a language which can be either dynamic
reach other users and the Fwrap devel-
or static takes some experience. Cython is clearly
opers on the the fwrap-users mailing list,
useful when talking to external libraries, but when
http://groups.google.com/group/fwrap-users.
is it worth it to replace normal Python code with
Cython code? The obvious factor to consider is
Limitations the purpose of the code is it a single exper-
iment, for which the Cython compilation time
When compared to writing code in pure Python, might overshadow the pure Python run time? Or
Cythons primary disadvantages are compilation is it a core library function, where every ounce of
time and the need to have a separate build phase. speed matters?
Most projects using Cython are therefore writ-
ten in a mix of Python and Cython, as Cython It is possible to paint some broad strokes when
sources dont need to be recompiled when Python it comes to the type of computation considered.
sources change. Cython can still be used to com- Is the bulk of time spent doing low-level number
pile some of the Python modules for performance crunching in your code, or is the heavy lifting done
reasons. There is also an experimental pure through calls to external libraries? How easy is it
mode where decorators are used to indicate static to express the computation in terms of NumPy
type declarations, which are valid Python and ig- operations? For sequential algorithms such as
nored by the interpreter at runtime, but are used equation solving and statistical simulations it is
by Cython when compiled. This combines the indeed impossible to do without a loop of some
advantage of a fast edit-run cycle with a high kind. Pure Python loops can be very slow; but
runtime performance of the final product. There the impact of this still varies depending on the
is also the question of code distribution. Many use case.
projects, rather than requiring Cython as a de-
pendency, ship the generated .c files which com- Further reading
pile against Python 2.3 to 3.2 without any modi-
fications as part of the distutils setup phase. If you think Cython might help you, then the next
stop is the Cython Tutorial [tutorial]. [numerics]
Compared to compiled languages such as Fortran
presents optimization strategies and benchmarks
and C, Cythons primary limitation is the limited
for computations.
support for shared memory parallelism. Python
As always, the online documentation at
References
[Alted] F. Alted. Why modern CPUs are starv-
ing and what can be done about it. CiSE 12,
68, 2010.
[comparison] I. M. Wilbers, H. P. Lang-
tangen, A. Oedegaard, Using Cython to
Speed up Numerical Python Programs,
Proceedings of MekIT09, 2009. URL:
http://simula.no/research/sc/publications/
Simula.SC.578
[fwrap] K. W. Smith, D. S. Selje-
botn, Fwrap: Fortran wrappers
in C, Cython & Python. URL:
http://conference.scipy.org/abstract?id=19
Project homepage:
http://fwrap.sourceforge.net/
[numerics] D. S. Seljebotn, Fast numerical com-
putations with Cython, Proceedings of the 8th
Python in Science Conference, 2009. URL:
http://conference.scipy.org/proceedings/
SciPy2009/paper 2
[NumPy] S. van der Walt, S. C. Colbert, G.
Varoquaux, The NumPy array: a structure for
efficient numerical computation, CiSE, present
issue
[Pyrex] G. Ewing, Pyrex: A language for
writing Python extension modules. URL:
http://www.cosc.canterbury.ac.nz/greg.ewing
/python/Pyrex/
[Sage] William A. Stein et al. Sage Mathematics
Software, The Sage Development Team, 2010,
http://www.sagemath.org.
[Theano] J. Bergstra. Optimized Symbolic Ex-
pressions and GPU Metaprogramming with
Theano, Proceedings of the 9th Python in Sci-
ence Conference (SciPy2010), Austin, Texas,
June 2010.
[numexpr] D. Cooke, F. Alted, T.
Hochberg, G. Thalhammer, numexpr
http://code.google.com/p/numexpr/