0% found this document useful (0 votes)
32 views20 pages

From Python To Numpy 2023 10 16 14 41 27

This document is a book titled 'From Python to Numpy' by Nicolas P. Rougier, focusing on techniques for migrating from Python to Numpy through vectorization. It covers various topics including the anatomy of Numpy arrays, custom vectorization, and the trade-offs between readability and speed in code. The author, an experienced research scientist, aims to provide insights and practical examples to enhance the reader's understanding of Numpy's capabilities.

Uploaded by

iman562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views20 pages

From Python To Numpy 2023 10 16 14 41 27

This document is a book titled 'From Python to Numpy' by Nicolas P. Rougier, focusing on techniques for migrating from Python to Numpy through vectorization. It covers various topics including the anatomy of Numpy arrays, custom vectorization, and the trade-offs between readability and speed in code. The author, an experienced research scientist, aims to provide insights and practical examples to enhance the reader's understanding of Numpy's capabilities.

Uploaded by

iman562
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

From Python to Numpy

Copyright (c) 2017 - Nicolas P. Rougier <[email protected]>

Latest version - May 2017


DOI: 10.5281/zenodo.225783

There are already a fair number of books about Numpy (see Bibliography) and a legitimate question
is to wonder if another book is really necessary. As you may have guessed by reading these lines, my
personal answer is yes, mostly because I think there is room for a different approach concentrating
on the migration from Python to Numpy through vectorization. There are a lot of techniques that you
don't find in books and such techniques are mostly learned through experience. The goal of this book
is to explain some of these techniques and to provide an opportunity for making this experience in
the process.
Website: http://www.labri.fr/perso/nrougier/from-python-to-numpy

Table of Contents
1. Preface • Temporal vectorization • Back to Python
• About the author • Spatial vectorization • Numpy & co
• About this book • Conclusion • Scipy & co
• License 5. Problem vectorization • Conclusion
2. Introduction • Introduction 8. Conclusion
• Simple example • Path finding 9. Quick References
• Readability vs speed • Fluid Dynamics • Data type
3. Anatomy of an array • Blue noise sampling • Creation
• Introduction • Conclusion • Indexing
• Memory layout 6. Custom vectorization • Reshaping
• Views and copies • Introduction • Broadcasting
• Conclusion • Typed list 10. Bibliography
4. Code vectorization • Memory aware array • Tutorials
• Introduction • Conclusion • Articles
• Uniform vectorization 7. Beyond Numpy • Books

Disclaimer: All external pictures should have associated credits. If there are missing credits, please tell
me, I will correct it. Similarly, all excerpts should be sourced (wikipedia mostly). If not, this is an error
and I will correct it as soon as you tell me.
1 Preface
Contents
About the author
About this book
Prerequisites
Conventions
How to contribute
Publishing
License

1.1 About the author


Nicolas P. Rougier is a full-time research scientist at Inria which is the French national institute for
research in computer science and control. This is a public scientific and technological establishment
(EPST) under the double supervision of the Research & Education Ministry, and the Ministry of
Economy Finance and Industry. Nicolas P. Rougier is working within the Mnemosyne project which lies
at the frontier between integrative and computational neuroscience in association with the Institute
of Neurodegenerative Diseases, the Bordeaux laboratory for research in computer science (LaBRI),
the University of Bordeaux and the national center for scientific research (CNRS).
He has been using Python for more than 15 years and numpy for more than 10 years for modeling in
neuroscience, machine learning and for advanced visualization (OpenGL). Nicolas P. Rougier is the
author of several online resources and tutorials (Matplotlib, numpy, OpenGL) and he's teaching
Python, numpy and scientific visualization at the University of Bordeaux and in various conferences
and schools worldwide (SciPy, EuroScipy, etc). He's also the author of the popular article Ten Simple
Rules for Better Figures and a popular matplotlib tutorial.

1.2 About this book


This book has been written in restructured text format and generated using the rst2html.py
command line available from the docutils python package.
If you want to rebuild the html output, from the top directory, type:

$ rst2html.py --link-stylesheet --cloak-email-addresses \


--toc-top-backlinks --stylesheet=book.css \
--stylesheet-dirs=. book.rst book.html

The sources are available from https://github.com/rougier/from-python-to-numpy.

Prerequisites

This is not a Python beginner guide and you should have an intermediate level in Python and ideally
a beginner level in numpy. If this is not the case, have a look at the bibliography for a curated list of
resources.

Conventions

We will use usual naming conventions. If not stated explicitly, each script should import numpy, scipy
and matplotlib as:

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

We'll use up-to-date versions (at the date of writing, i.e. January, 2017) of the different packages:

Packages Version
Python 3.6.0
Numpy 1.12.0
Scipy 0.18.1
Matplotlib 2.0.0
How to contribute

If you want to contribute to this book, you can:


Review chapters (please contact me)
Report issues (https://github.com/rougier/from-python-to-numpy/issues)
Suggest improvements (https://github.com/rougier/from-python-to-numpy/pulls)
Correct English (https://github.com/rougier/from-python-to-numpy/issues)
Design a better and more responsive html template for the book.
Star the project (https://github.com/rougier/from-python-to-numpy)

Publishing

If you're an editor interested in publishing this book, you can contact me if you agree to have this
version and all subsequent versions open access (i.e. online at this address), you know how to deal
with restructured text (Word is not an option), you provide a real added-value as well as supporting
services, and more importantly, you have a truly amazing latex book template (and be warned that
I'm a bit picky about typography & design: Edward Tufte is my hero). Still here?

1.3 License
Book
This work is licensed under a Creative Commons Attribution-Non Commercial-Share Alike 4.0
International License. You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
The licensor cannot revoke these freedoms as long as you follow the license terms.
Code
The code is licensed under the OSI-approved BSD 2-Clause License.

2 Introduction
Contents
Simple example
Readability vs speed

2.1 Simple example


Note
You can execute any code below from the code folder using the regular python shell or from
inside an IPython session or Jupyter notebook. In such a case, you might want to use the magic
command %timeit instead of the custom one I wrote.

Numpy is all about vectorization. If you are familiar with Python, this is the main difficulty you'll face
because you'll need to change your way of thinking and your new friends (among others) are named
"vectors", "arrays", "views" or "ufuncs".
Let's take a very simple example, random walk. One possible object oriented approach would be to
define a RandomWalker class and write a walk method that would return the current position after
each (random) step. It's nice, it's readable, but it is slow:
Object oriented approach

class RandomWalker:
def __init__(self):
self.position = 0

def walk(self, n):


self.position = 0
for i in range(n):
yield self.position
self.position += 2*random.randint(0, 1) - 1

walker = RandomWalker()
walk = [position for position in walker.walk(1000)]

Benchmarking gives us:

>>> from tools import timeit


>>> walker = RandomWalker()
>>> timeit("[position for position in walker.walk(n=10000)]", globals())
10 loops, best of 3: 15.7 msec per loop

Procedural approach
For such a simple problem, we can probably save the class definition and concentrate only on the
walk method that computes successive positions after each random step.

def random_walk(n):
position = 0
walk = [position]
for i in range(n):
position += 2*random.randint(0, 1)-1
walk.append(position)
return walk

walk = random_walk(1000)

This new method saves some CPU cycles but not that much because this function is pretty much the
same as in the object-oriented approach and the few cycles we saved probably come from the inner
Python object-oriented machinery.

>>> from tools import timeit


>>> timeit("random_walk(n=10000)", globals())
10 loops, best of 3: 15.6 msec per loop

Vectorized approach
But we can do better using the itertools Python module that offers a set of functions creating iterators
for efficient looping. If we observe that a random walk is an accumulation of steps, we can rewrite the
function by first generating all the steps and accumulate them without any loop:

def random_walk_faster(n=1000):
from itertools import accumulate
# Only available from Python 3.6
steps = random.choices([-1,+1], k=n)
return [0]+list(accumulate(steps))

walk = random_walk_faster(1000)

In fact, we've just vectorized our function. Instead of looping for picking sequential steps and add
them to the current position, we first generated all the steps at once and used the accumulate
function to compute all the positions. We got rid of the loop and this makes things faster:

>>> from tools import timeit


>>> timeit("random_walk_faster(n=10000)", globals())
10 loops, best of 3: 2.21 msec per loop

We gained 85% of computation-time compared to the previous version, not so bad. But the
advantage of this new version is that it makes numpy vectorization super simple. We just have to
translate itertools call into numpy ones.

def random_walk_fastest(n=1000):
# No 's' in numpy choice (Python offers choice & choices)
steps = np.random.choice([-1,+1], n)
return np.cumsum(steps)

walk = random_walk_fastest(1000)

Not too difficult, but we gained a factor 500x using numpy:

>>> from tools import timeit


>>> timeit("random_walk_fastest(n=10000)", globals())
1000 loops, best of 3: 14 usec per loop

This book is about vectorization, be it at the code or problem level. We'll see this difference is
important before looking at custom vectorization.
2.2 Readability vs speed
Before heading to the next chapter, I would like to warn you about a potential problem you may
encounter once you'll have become familiar with numpy. It is a very powerful library and you can
make wonders with it but, most of the time, this comes at the price of readability. If you don't
comment your code at the time of writing, you won't be able to tell what a function is doing after a
few weeks (or possibly days). For example, can you tell what the two functions below are doing?
Probably you can tell for the first one, but unlikely for the second (or your name is Jaime Fernández
del Río and you don't need to read this book).

def function_1(seq, sub):


return [i for i in range(len(seq) - len(sub)) if seq[i:i+len(sub)] == sub]

def function_2(seq, sub):


target = np.dot(sub, sub)
candidates = np.where(np.correlate(seq, sub, mode='valid') == target)[0]
check = candidates[:, np.newaxis] + np.arange(len(sub))
mask = np.all((np.take(seq, check) == sub), axis=-1)
return candidates[mask]

As you may have guessed, the second function is the vectorized-optimized-faster-numpy version of
the first function. It is 10 times faster than the pure Python version, but it is hardly readable.

3 Anatomy of an array
Contents
Introduction
Memory layout
Views and copies
Direct and indirect access
Temporary copy
Conclusion

3.1 Introduction
As explained in the Preface, you should have a basic experience with numpy to read this book. If this
is not the case, you'd better start with a beginner tutorial before coming back here. Consequently I'll
only give here a quick reminder on the basic anatomy of numpy arrays, especially regarding the
memory layout, view, copy and the data type. They are critical notions to understand if you want
your computation to benefit from numpy philosophy.
Let's consider a simple example where we want to clear all the values from an array which has the
dtype np.float32. How does one write it to maximize speed? The below syntax is rather obvious (at
least for those familiar with numpy) but the above question asks to find the fastest operation.

>>> Z = np.ones(4*1000000, np.float32)


>>> Z[...] = 0

If you look more closely at both the dtype and the size of the array, you can observe that this array
can be casted (i.e. viewed) into many other "compatible" data types. By compatible, I mean that
Z.size * Z.itemsize can be divided by the new dtype itemsize.

>>> timeit("Z.view(np.float16)[...] = 0", globals())


100 loops, best of 3: 2.72 msec per loop
>>> timeit("Z.view(np.int16)[...] = 0", globals())
100 loops, best of 3: 2.77 msec per loop
>>> timeit("Z.view(np.int32)[...] = 0", globals())
100 loops, best of 3: 1.29 msec per loop
>>> timeit("Z.view(np.float32)[...] = 0", globals())
100 loops, best of 3: 1.33 msec per loop
>>> timeit("Z.view(np.int64)[...] = 0", globals())
100 loops, best of 3: 874 usec per loop
>>> timeit("Z.view(np.float64)[...] = 0", globals())
100 loops, best of 3: 865 usec per loop
>>> timeit("Z.view(np.complex128)[...] = 0", globals())
100 loops, best of 3: 841 usec per loop
>>> timeit("Z.view(np.int8)[...] = 0", globals())
100 loops, best of 3: 630 usec per loop
Interestingly enough, the obvious way of clearing all the values is not the fastest. By casting the array
into a larger data type such as np.float64, we gained a 25% speed factor. But, by viewing the array
as a byte array (np.int8), we gained a 50% factor. The reason for such speedup are to be found in
the internal numpy machinery and the compiler optimization. This simple example illustrates the
philosophy of numpy as we'll see in the next section below.

3.2 Memory layout


The numpy documentation defines the ndarray class very clearly:
An instance of class ndarray consists of a contiguous one-dimensional segment of computer
memory (owned by the array, or by some other object), combined with an indexing scheme that
maps N integers into the location of an item in the block.

Said differently, an array is mostly a contiguous block of memory whose parts can be accessed using
an indexing scheme. Such indexing scheme is in turn defined by a shape and a data type and this is
precisely what is needed when you define a new array:

Z = np.arange(9).reshape(3,3).astype(np.int16)

Here, we know that Z itemsize is 2 bytes (int16), the shape is (3,3) and the number of dimensions is 2
(len(Z.shape)).

>>> print(Z.itemsize)
2
>>> print(Z.shape)
(3, 3)
>>> print(Z.ndim)
2

Furthermore and because Z is not a view, we can deduce the strides of the array that define the
number of bytes to step in each dimension when traversing the array.

>>> strides = Z.shape[1]*Z.itemsize, Z.itemsize


>>> print(strides)
(6, 2)
>>> print(Z.strides)
(6, 2)

With all these information, we know how to access a specific item (designed by an index tuple) and
more precisely, how to compute the start and end offsets:

offset_start = 0
for i in range(ndim):
offset_start += strides[i]*index[i]
offset_end = offset_start + Z.itemsize

Let's see if this is correct using the tobytes conversion method:

>>> Z = np.arange(9).reshape(3,3).astype(np.int16)
>>> index = 1,1
>>> print(Z[index].tobytes())
b'\x04\x00'
>>> offset = 0
>>> for i in range(Z.ndim):
... offset + = Z.strides[i]*index[i]
>>> print(Z.tobytes()[offset_start:offset_end]
b'\x04\x00'

This array can be actually considered from different perspectives (i.e. layouts):
Item layout
shape[1]
(=3)
┌───────────┐

┌ ┌───┬───┬───┐ ┐
│ │ 0 │ 1 │ 2 │ │
│ ├───┼───┼───┤ │
shape[0] │ │ 3 │ 4 │ 5 │ │ len(Z)
(=3) │ ├───┼───┼───┤ │ (=3)
│ │ 6 │ 7 │ 8 │ │
└ └───┴───┴───┘ ┘

Flattened item layout


┌───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │
└───┴───┴───┴───┴───┴───┴───┴───┴───┘

└───────────────────────────────────┘
Z.size
(=9)

Memory layout (C order, big endian)

strides[1]
(=2)
┌─────────────────────┐

┌ ┌──────────┬──────────┐ ┐
│ p+00: │ 00000000 │ 00000000 │ │
│ ├──────────┼──────────┤ │
│ p+02: │ 00000000 │ 00000001 │ │ strides[0]
│ ├──────────┼──────────┤ │ (=2x3)
│ p+04 │ 00000000 │ 00000010 │ │
│ ├──────────┼──────────┤ ┘
│ p+06 │ 00000000 │ 00000011 │
│ ├──────────┼──────────┤
Z.nbytes │ p+08: │ 00000000 │ 00000100 │
(=3x3x2) │ ├──────────┼──────────┤
│ p+10: │ 00000000 │ 00000101 │
│ ├──────────┼──────────┤
│ p+12: │ 00000000 │ 00000110 │
│ ├──────────┼──────────┤
│ p+14: │ 00000000 │ 00000111 │
│ ├──────────┼──────────┤
│ p+16: │ 00000000 │ 00001000 │
└ └──────────┴──────────┘

└─────────────────────┘
Z.itemsize
Z.dtype.itemsize
(=2)

If we now take a slice of Z, the result is a view of the base array Z:

V = Z[::2,::2]

Such view is specified using a shape, a dtype and strides because strides cannot be deduced anymore
from the dtype and shape only:
Item layout

shape[1]
(=2)
┌───────────┐

┌ ┌───┬╌╌╌┬───┐ ┐
│ │ 0 │ │ 2 │ │ ┌───┬───┐
│ ├───┼╌╌╌┼───┤ │ │ 0 │ 2 │
shape[0] │ ╎ ╎ ╎ ╎ │ len(Z) → ├───┼───┤
(=2) │ ├───┼╌╌╌┼───┤ │ (=2) │ 6 │ 8 │
│ │ 6 │ │ 8 │ │ └───┴───┘
└ └───┴╌╌╌┴───┘ ┘

Flattened item layout

┌───┬╌╌╌┬───┬╌╌╌┬╌╌╌┬╌╌╌┬───┬╌╌╌┬───┐ ┌───┬───┬───┬───┐
│ 0 │ │ 2 │ ╎ ╎ │ 6 │ │ 8 │ → │ 0 │ 2 │ 6 │ 8 │
└───┴╌╌╌┴───┴╌╌╌┴╌╌╌┴╌╌╌┴───┴╌╌╌┴───┘ └───┴───┴───┴───┘
└─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘
└───┬───┘ └───┬───┘
└───────────┬───────────┘
Z.size
(=4)

Memory layout (C order, big endian)

┌ ┌──────────┬──────────┐ ┐ ┐
┌─┤ p+00: │ 00000000 │ 00000000 │ │ │
│ └ ├──────────┼──────────┤ │ strides[1] │
┌─┤ p+02: │ │ │ │ (=4) │
│ │ ┌ ├──────────┼──────────┤ ┘ │
│ └─┤ p+04 │ 00000000 │ 00000010 │ │
│ └ ├──────────┼──────────┤ │ strides[0]
│ p+06: │ │ │ │ (=12)
│ ├──────────┼──────────┤ │
Z.nbytes ─┤ p+08: │ │ │ │
(=8) │ ├──────────┼──────────┤ │
│ p+10: │ │ │ │
│ ┌ ├──────────┼──────────┤ ┘
│ ┌─┤ p+12: │ 00000000 │ 00000110 │
│ │ └ ├──────────┼──────────┤
└─┤ p+14: │ │ │
│ ┌ ├──────────┼──────────┤
└─┤ p+16: │ 00000000 │ 00001000 │
└ └──────────┴──────────┘

└─────────────────────┘
Z.itemsize
Z.dtype.itemsize
(=2)

3.3 Views and copies


Views and copies are important concepts for the optimization of your numerical computations. Even
if we've already manipulated them in the previous section, the whole story is a bit more complex.

Direct and indirect access

First, we have to distinguish between indexing and fancy indexing. The first will always return a view
while the second will return a copy. This difference is important because in the first case, modifying
the view modifies the base array while this is not true in the second case:

>>> Z = np.zeros(9)
>>> Z_view = Z[:3]
>>> Z_view[...] = 1
>>> print(Z)
[ 1. 1. 1. 0. 0. 0. 0. 0. 0.]
>>> Z = np.zeros(9)
>>> Z_copy = Z[[0,1,2]]
>>> Z_copy[...] = 1
>>> print(Z)
[ 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Thus, if you need fancy indexing, it's better to keep a copy of your fancy index (especially if it was
complex to compute it) and to work with it:

>>> Z = np.zeros(9)
>>> index = [0,1,2]
>>> Z[index] = 1
>>> print(Z)
[ 1. 1. 1. 0. 0. 0. 0. 0. 0.]

If you are unsure if the result of your indexing is a view or a copy, you can check what is the base of
your result. If it is None, then you result is a copy:

>>> Z = np.random.uniform(0,1,(5,5))
>>> Z1 = Z[:3,:]
>>> Z2 = Z[[0,1,2], :]
>>> print(np.allclose(Z1,Z2))
True
>>> print(Z1.base is Z)
True
>>> print(Z2.base is Z)
False
>>> print(Z2.base is None)
True

Note that some numpy functions return a view when possible (e.g. ravel) while some others always
return a copy (e.g. flatten):

>>> Z = np.zeros((5,5))
>>> Z.ravel().base is Z
True
>>> Z[::2,::2].ravel().base is Z
False
>>> Z.flatten().base is Z
False

Temporary copy

Copies can be made explicitly like in the previous section, but the most general case is the implicit
creation of intermediate copies. This is the case when you are doing some arithmetic with arrays:

>>> X = np.ones(10, dtype=np.int)


>>> Y = np.ones(10, dtype=np.int)
>>> A = 2*X + 2*Y

In the example above, three intermediate arrays have been created. One for holding the result of
2*X, one for holding the result of 2*Y and the last one for holding the result of 2*X+2*Y. In this
specific case, the arrays are small enough and this does not really make a difference. However, if your
arrays are big, then you have to be careful with such expressions and wonder if you can do it
differently. For example, if only the final result matters and you don't need X nor Y afterwards, an
alternate solution would be:

>>> X = np.ones(10, dtype=np.int)


>>> Y = np.ones(10, dtype=np.int)
>>> np.multiply(X, 2, out=X)
>>> np.multiply(Y, 2, out=Y)
>>> np.add(X, Y, out=X)

Using this alternate solution, no temporary array has been created. Problem is that there are many
other cases where such copies needs to be created and this impact the performance like
demonstrated on the example below:

>>> X = np.ones(1000000000, dtype=np.int)


>>> Y = np.ones(1000000000, dtype=np.int)
>>> timeit("X = X + 2.0*Y", globals())
100 loops, best of 3: 3.61 ms per loop
>>> timeit("X = X + 2*Y", globals())
100 loops, best of 3: 3.47 ms per loop
>>> timeit("X += 2*Y", globals())
100 loops, best of 3: 2.79 ms per loop
>>> timeit("np.add(X, Y, out=X); np.add(X, Y, out=X)", globals())
1000 loops, best of 3: 1.57 ms per loop

3.4 Conclusion
As a conclusion, we'll make an exercise. Given two vectors Z1 and Z2. We would like to know if Z2 is a
view of Z1 and if yes, what is this view ?

>>> Z1 = np.arange(10)
>>> Z2 = Z1[1:-1:2]

╌╌╌┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬╌╌
Z1 │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │
╌╌╌┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴╌╌
╌╌╌╌╌╌╌┬───┬╌╌╌┬───┬╌╌╌┬───┬╌╌╌┬───┬╌╌╌╌╌╌╌╌╌╌
Z2 │ 1 │ │ 3 │ │ 5 │ │ 7 │
╌╌╌╌╌╌╌┴───┴╌╌╌┴───┴╌╌╌┴───┴╌╌╌┴───┴╌╌╌╌╌╌╌╌╌╌

First, we need to check if Z1 is the base of Z2

>>> print(Z2.base is Z1)


True

At this point, we know Z2 is a view of Z1, meaning Z2 can be expressed as Z1[start:stop:step].


The difficulty is to find start, stop and step. For the step, we can use the strides property of
any array that gives the number of bytes to go from one element to the other in each dimension. In
our case, and because both arrays are one-dimensional, we can directly compare the first stride only:

>>> step = Z2.strides[0] // Z1.strides[0]


>>> print(step)
2

Next difficulty is to find the start and the stop indices. To do this, we can take advantage of the
byte_bounds method that returns a pointer to the end-points of an array.

byte_bounds(Z1)[0] byte_bounds(Z1)[-1]
↓ ↓
╌╌╌┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬╌╌
Z1 │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │
╌╌╌┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴╌╌

byte_bounds(Z2)[0] byte_bounds(Z2)[-1]
↓ ↓
╌╌╌╌╌╌╌┬───┬╌╌╌┬───┬╌╌╌┬───┬╌╌╌┬───┬╌╌╌╌╌╌╌╌╌╌
Z2 │ 1 │ │ 3 │ │ 5 │ │ 7 │
╌╌╌╌╌╌╌┴───┴╌╌╌┴───┴╌╌╌┴───┴╌╌╌┴───┴╌╌╌╌╌╌╌╌╌╌

>>> offset_start = np.byte_bounds(Z2)[0] - np.byte_bounds(Z1)[0]


>>> print(offset_start) # bytes
8

>>> offset_stop = np.byte_bounds(Z2)[-1] - np.byte_bounds(Z1)[-1]


>>> print(offset_stop) # bytes
-16
Converting these offsets into indices is straightforward using the itemsize and taking into account
that the offset_stop is negative (end-bound of Z2 is logically smaller than end-bound of Z1 array).
We thus need to add the items size of Z1 to get the right end index.

>>> start = offset_start // Z1.itemsize


>>> stop = Z1.size + offset_stop // Z1.itemsize
>>> print(start, stop, step)
1, 8, 2

Last we test our results:

>>> print(np.allclose(Z1[start:stop:step], Z2))


True

As an exercise, you can improve this first and very simple implementation by taking into account:
Negative steps
Multi-dimensional arrays
Solution to the exercise.

4 Code vectorization
Contents
Introduction
Uniform vectorization
The Game of Life
Python implementation
Numpy implementation
Exercise
Sources
References
Temporal vectorization
Python implementation
Numpy implementation
Faster numpy implementation
Visualization
Exercise
Sources
References
Spatial vectorization
Boids
Python implementation
Numpy implementation
Exercise
Sources
References
Conclusion

4.1 Introduction
Code vectorization means that the problem you're trying to solve is inherently vectorizable and only
requires a few numpy tricks to make it faster. Of course it does not mean it is easy or straightforward,
but at least it does not necessitate totally rethinking your problem (as it will be the case in the
Problem vectorization chapter). Still, it may require some experience to see where code can be
vectorized. Let's illustrate this through a simple example where we want to sum up two lists of
integers. One simple way using pure Python is:

def add_python(Z1,Z2):
return [z1+z2 for (z1,z2) in zip(Z1,Z2)]

This first naive solution can be vectorized very easily using numpy:
def add_numpy(Z1,Z2):
return np.add(Z1,Z2)

Without any surprise, benchmarking the two approaches shows the second method is the fastest
with one order of magnitude.

>>> Z1 = random.sample(range(1000), 100)


>>> Z2 = random.sample(range(1000), 100)
>>> timeit("add_python(Z1, Z2)", globals())
1000 loops, best of 3: 68 usec per loop
>>> timeit("add_numpy(Z1, Z2)", globals())
10000 loops, best of 3: 1.14 usec per loop

Not only is the second approach faster, but it also naturally adapts to the shape of Z1 and Z2. This is
the reason why we did not write Z1 + Z2 because it would not work if Z1 and Z2 were both lists. In
the first Python method, the inner + is interpreted differently depending on the nature of the two
objects such that if we consider two nested lists, we get the following outputs:

>>> Z1 = [[1, 2], [3, 4]]


>>> Z2 = [[5, 6], [7, 8]]
>>> Z1 + Z2
[[1, 2], [3, 4], [5, 6], [7, 8]]
>>> add_python(Z1, Z2)
[[1, 2, 5, 6], [3, 4, 7, 8]]
>>> add_numpy(Z1, Z2)
[[ 6 8]
[10 12]]

The first method concatenates the two lists together, the second method concatenates the internal
lists together and the last one computes what is (numerically) expected. As an exercise, you can
rewrite the Python version such that it accepts nested lists of any depth.

4.2 Uniform vectorization


Uniform vectorization is the simplest form of vectorization where all the elements share the same
computation at every time step with no specific processing for any element. One stereotypical case is
the Game of Life that has been invented by John Conway (see below) and is one of the earliest
examples of cellular automata. Those cellular automata can be conveniently regarded as an array of
cells that are connected together with the notion of neighbours and their vectorization is
straightforward. Let me first define the game and we'll see how to vectorize it.

Figure 4.1
Conus textile snail exhibits a cellular automaton pattern on its shell. Image by Richard Ling,
2005.

The Game of Life

Note
Excerpt from the Wikipedia entry on the Game of Life

The Game of Life is a cellular automaton devised by the British mathematician John Horton Conway
in 1970. It is the best-known example of a cellular automaton. The "game" is actually a zero-player
game, meaning that its evolution is determined by its initial state, needing no input from human
players. One interacts with the Game of Life by creating an initial configuration and observing how it
evolves.
The universe of the Game of Life is an infinite two-dimensional orthogonal grid of square cells, each
of which is in one of two possible states, live or dead. Every cell interacts with its eight neighbours,
which are the cells that are directly horizontally, vertically, or diagonally adjacent. At each step in
time, the following transitions occur:
1. Any live cell with fewer than two live neighbours dies, as if by needs caused by underpopulation.
2. Any live cell with more than three live neighbours dies, as if by overcrowding.
3. Any live cell with two or three live neighbours lives, unchanged, to the next generation.
4. Any dead cell with exactly three live neighbours becomes a live cell.
The initial pattern constitutes the 'seed' of the system. The first generation is created by applying the
above rules simultaneously to every cell in the seed – births and deaths happen simultaneously, and
the discrete moment at which this happens is sometimes called a tick. (In other words, each
generation is a pure function of the one before.) The rules continue to be applied repeatedly to
create further generations.

Python implementation

Note
We could have used the more efficient python array interface but it is more convenient to use
the familiar list object.

In pure Python, we can code the Game of Life using a list of lists representing the board where cells
are supposed to evolve. Such a board will be equipped with border of 0 that allows to accelerate
things a bit by avoiding having specific tests for borders when counting the number of neighbours.

Z = [[0,0,0,0,0,0],
[0,0,0,1,0,0],
[0,1,0,1,0,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0],
[0,0,0,0,0,0]]

Taking the border into account, counting neighbours then is straightforward:

def compute_neighbours(Z):
shape = len(Z), len(Z[0])
N = [[0,]*(shape[0]) for i in range(shape[1])]
for x in range(1,shape[0]-1):
for y in range(1,shape[1]-1):
N[x][y] = Z[x-1][y-1]+Z[x][y-1]+Z[x+1][y-1] \
+ Z[x-1][y] +Z[x+1][y] \
+ Z[x-1][y+1]+Z[x][y+1]+Z[x+1][y+1]
return N

To iterate one step in time, we then simply count the number of neighbours for each internal cell and
we update the whole board according to the four aforementioned rules:

def iterate(Z):
N = compute_neighbours(Z)
for x in range(1,shape[0]-1):
for y in range(1,shape[1]-1):
if Z[x][y] == 1 and (N[x][y] < 2 or N[x][y] > 3):
Z[x][y] = 0
elif Z[x][y] == 0 and N[x][y] == 3:
Z[x][y] = 1
return Z

The figure below shows four iterations on a 4x4 area where the initial state is a glider, a structure
discovered by Richard K. Guy in 1970.

Figure 4.2
The glider pattern is known to replicate itself one step diagonally in 4 iterations.

Numpy implementation

Starting from the Python version, the vectorization of the Game of Life requires two parts, one
responsible for counting the neighbours and one responsible for enforcing the rules. Neighbour-
counting is relatively easy if we remember we took care of adding a null border around the arena. By
considering partial views of the arena we can actually access neighbours quite intuitively as illustrated
below for the one-dimensional case:

┏━━━┳━━━┳━━━┓───┬───┐
Z[:-2] ┃ 0 ┃ 1 ┃ 1 ┃ 1 │ 0 │ (left neighbours)
┗━━━┻━━━┻━━━┛───┴───┘
↓︎
┌───┏━━━┳━━━┳━━━┓───┐
Z[1:-1] │ 0 ┃ 1 ┃ 1 ┃ 1 ┃ 0 │ (actual cells)
└───┗━━━┻━━━┻━━━┛───┘

┌───┬───┏━━━┳━━━┳━━━┓
Z[+2:] │ 0 │ 1 ┃ 1 ┃ 1 ┃ 0 ┃ (right neighbours)
└───┴───┗━━━┻━━━┻━━━┛

Going to the two dimensional case requires just a bit of arithmetic to make sure to consider all the
eight neighbours.

N = np.zeros(Z.shape, dtype=int)
N[1:-1,1:-1] += (Z[ :-2, :-2] + Z[ :-2,1:-1] + Z[ :-2,2:] +
Z[1:-1, :-2] + Z[1:-1,2:] +
Z[2: , :-2] + Z[2: ,1:-1] + Z[2: ,2:])

For the rule enforcement, we can write a first version using numpy's argwhere method that will give
us the indices where a given condition is True.

# Flatten arrays
N_ = N.ravel()
Z_ = Z.ravel()

# Apply rules
R1 = np.argwhere( (Z_==1) & (N_ < 2) )
R2 = np.argwhere( (Z_==1) & (N_ > 3) )
R3 = np.argwhere( (Z_==1) & ((N_==2) | (N_==3)) )
R4 = np.argwhere( (Z_==0) & (N_==3) )

# Set new values


Z_[R1] = 0
Z_[R2] = 0
Z_[R3] = Z_[R3]
Z_[R4] = 1

# Make sure borders stay null


Z[0,:] = Z[-1,:] = Z[:,0] = Z[:,-1] = 0

Even if this first version does not use nested loops, it is far from optimal because of the use of the four
argwhere calls that may be quite slow. We can instead factorize the rules into cells that will survive
(stay at 1) and cells that will give birth. For doing this, we can take advantage of Numpy boolean
capability and write quite naturally:

Note
We did no write Z = 0 as this would simply assign the value 0 to Z that would then become a
simple scalar.

birth = (N==3)[1:-1,1:-1] & (Z[1:-1,1:-1]==0)


survive = ((N==2) | (N==3))[1:-1,1:-1] & (Z[1:-1,1:-1]==1)
Z[...] = 0
Z[1:-1,1:-1][birth | survive] = 1

If you look at the birth and survive lines, you'll see that these two variables are arrays that can be
used to set Z values to 1 after having cleared it.

Figure 4.3
The Game of Life. Gray levels indicate how much a cell has been active in the past.
0:00 / 0:50

Exercise

Reaction and diffusion of chemical species can produce a variety of patterns, reminiscent of those
often seen in nature. The Gray-Scott equations model such a reaction. For more information on this
chemical system see the article Complex Patterns in a Simple System (John E. Pearson, Science, Volume
261, 1993). Let's consider two chemical species U and V with respective concentrations u and v and
diffusion rates Du and Dv . V is converted into P with a rate of conversion k . f represents the rate of
the process that feeds U and drains U , V and P . This can be written as:

Chemical reaction Equations


U + 2V → 3V u̇ = Du∇2u − uv 2 + f (1 − u)
V→P v̇ = Dv ∇2v + uv 2 − (f + k )v

Based on the Game of Life example, try to implement such reaction-diffusion system. Here is a set of
interesting parameters to test:

Name Du Dv f k
Bacteria 1 0.16 0.08 0.035 0.065
Bacteria 2 0.14 0.06 0.035 0.065
Coral 0.16 0.08 0.060 0.062
Fingerprint 0.19 0.05 0.060 0.062
Spirals 0.10 0.10 0.018 0.050
Spirals Dense 0.12 0.08 0.020 0.050
Spirals Fast 0.10 0.16 0.020 0.050
Unstable 0.16 0.08 0.020 0.055
Worms 1 0.16 0.08 0.050 0.065
Worms 2 0.16 0.08 0.054 0.063
Zebrafish 0.16 0.08 0.035 0.060

The figure below shows some animations of the model for a specific set of parameters.

Figure 4.4
Reaction-diffusion Gray-Scott model. From left to right, Bacteria 1, Coral and Spiral Dense.

0:00 0:00

0:00

Sources
game_of_life_python.py
game_of_life_numpy.py
gray_scott.py (solution to the exercise)

References

John Conway new solitaire game "life", Martin Gardner, Scientific American 223, 1970.
Gray Scott Model of Reaction Diffusion, Abelson, Adams, Coore, Hanson, Nagpal, Sussman, 1997.
Reaction-Diffusion by the Gray-Scott Model, Robert P. Munafo, 1996.

4.3 Temporal vectorization


The Mandelbrot set is the set of complex numbers c for which the function f c(z ) = z 2 + c does not
diverge when iterated from z = 0, i.e., for which the sequence f c(0), f c(f c(0)), etc., remains bounded in
absolute value. It is very easy to compute, but it can take a very long time because you need to
ensure a given number does not diverge. This is generally done by iterating the computation up to a
maximum number of iterations, after which, if the number is still within some bounds, it is considered
non-divergent. Of course, the more iterations you do, the more precision you get.

Figure 4.5
Romanesco broccoli, showing self-similar form approximating a natural fractal. Image by Jon
Sullivan, 2004.

Python implementation

A pure python implementation is written as:

def mandelbrot_python(xmin, xmax, ymin, ymax, xn, yn, maxiter, horizon=2.0):


def mandelbrot(z, maxiter):
c = z
for n in range(maxiter):
if abs(z) > horizon:
return n
z = z*z + c
return maxiter
r1 = [xmin+i*(xmax-xmin)/xn for i in range(xn)]
r2 = [ymin+i*(ymax-ymin)/yn for i in range(yn)]
return [mandelbrot(complex(r, i),maxiter) for r in r1 for i in r2]

The interesting (and slow) part of this code is the mandelbrot function that actually computes the
sequence f c(f c(f c...))). The vectorization of such code is not totally straightforward because the
internal return implies a differential processing of the element. Once it has diverged, we don't need
to iterate any more and we can safely return the iteration count at divergence. The problem is to
then do the same in numpy. But how?

Numpy implementation

The trick is to search at each iteration values that have not yet diverged and update relevant
information for these values and only these values. Because we start from Z = 0, we know that each
value will be updated at least once (when they're equal to 0, they have not yet diverged) and will stop
being updated as soon as they've diverged. To do that, we'll use numpy fancy indexing with the
less(x1,x2) function that return the truth value of (x1 < x2) element-wise.
def mandelbrot_numpy(xmin, xmax, ymin, ymax, xn, yn, maxiter, horizon=2.0):
X = np.linspace(xmin, xmax, xn, dtype=np.float32)
Y = np.linspace(ymin, ymax, yn, dtype=np.float32)
C = X + Y[:,None]*1j
N = np.zeros(C.shape, dtype=int)
Z = np.zeros(C.shape, np.complex64)
for n in range(maxiter):
I = np.less(abs(Z), horizon)
N[I] = n
Z[I] = Z[I]**2 + C[I]
N[N == maxiter-1] = 0
return Z, N

Here is the benchmark:

>>> xmin, xmax, xn = -2.25, +0.75, int(3000/3)


>>> ymin, ymax, yn = -1.25, +1.25, int(2500/3)
>>> maxiter = 200
>>> timeit("mandelbrot_python(xmin, xmax, ymin, ymax, xn, yn, maxiter)", globals(
1 loops, best of 3: 6.1 sec per loop
>>> timeit("mandelbrot_numpy(xmin, xmax, ymin, ymax, xn, yn, maxiter)", globals()
1 loops, best of 3: 1.15 sec per loop

Faster numpy implementation

The gain is roughly a 5x factor, not as much as we could have expected. Part of the problem is that
the np.less function implies xn × yn tests at every iteration while we know that some values have
already diverged. Even if these tests are performed at the C level (through numpy), the cost is
nonetheless significant. Another approach proposed by Dan Goodman is to work on a dynamic array
at each iteration that stores only the points which have not yet diverged. It requires more lines but
the result is faster and leads to a 10x factor speed improvement compared to the Python version.

def mandelbrot_numpy_2(xmin, xmax, ymin, ymax, xn, yn, itermax, horizon=2.0):


Xi, Yi = np.mgrid[0:xn, 0:yn]
Xi, Yi = Xi.astype(np.uint32), Yi.astype(np.uint32)
X = np.linspace(xmin, xmax, xn, dtype=np.float32)[Xi]
Y = np.linspace(ymin, ymax, yn, dtype=np.float32)[Yi]
C = X + Y*1j
N_ = np.zeros(C.shape, dtype=np.uint32)
Z_ = np.zeros(C.shape, dtype=np.complex64)
Xi.shape = Yi.shape = C.shape = xn*yn

Z = np.zeros(C.shape, np.complex64)
for i in range(itermax):
if not len(Z): break

# Compute for relevant points only


np.multiply(Z, Z, Z)
np.add(Z, C, Z)

# Failed convergence
I = abs(Z) > horizon
N_[Xi[I], Yi[I]] = i+1
Z_[Xi[I], Yi[I]] = Z[I]

# Keep going with those who have not diverged yet


np.negative(I,I)
Z = Z[I]
Xi, Yi = Xi[I], Yi[I]
C = C[I]
return Z_.T, N_.T

The benchmark gives us:

>>> timeit("mandelbrot_numpy_2(xmin, xmax, ymin, ymax, xn, yn, maxiter)", globals


1 loops, best of 3: 510 msec per loop

Visualization

In order to visualize our results, we could directly display the N array using the matplotlib imshow
command, but this would result in a "banded" image that is a known consequence of the escape
count algorithm that we've been using. Such banding can be eliminated by using a fractional escape
count. This can be done by measuring how far the iterated point landed outside of the escape cutoff.
See the reference below about the renormalization of the escape count. Here is a picture of the result
where we use recount normalization, and added a power normalized color map (gamma=0.3) as well
as light shading.

Figure 4.6
The Mandelbrot as rendered by matplotlib using recount normalization, power normalized
color map (gamma=0.3) and light shading.
Exercise

Note
You should look at the ufunc.reduceat method that performs a (local) reduce with specified
slices over a single axis.

We now want to measure the fractal dimension of the Mandelbrot set using the Minkowski–Bouligand
dimension. To do that, we need to do box-counting with a decreasing box size (see figure below). As
you can imagine, we cannot use pure Python because it would be way too slow. The goal of the
exercise is to write a function using numpy that takes a two-dimensional float array and returns the
dimension. We'll consider values in the array to be normalized (i.e. all values are between 0 and 1).

Figure 4.7
The Minkowski–Bouligand dimension of the Great Britain coastlines is approximately 1.24.

Sources

mandelbrot.py
mandelbrot_python.py
mandelbrot_numpy_1.py
mandelbrot_numpy_2.py
fractal_dimension.py (solution to the exercise)

References

How To Quickly Compute the Mandelbrot Set in Python, Jean Francois Puget, 2015.
My Christmas Gift: Mandelbrot Set Computation In Python, Jean Francois Puget, 2015.
Fast fractals with Python and Numpy, Dan Goodman, 2009.
Renormalizing the Mandelbrot Escape, Linas Vepstas, 1997.

4.4 Spatial vectorization


Spatial vectorization refers to a situation where elements share the same computation but are in
interaction with only a subgroup of other elements. This was already the case for the game of life
example, but in some situations there is an added difficulty because the subgroup is dynamic and
needs to be updated at each iteration. This the case, for example, in particle systems where particles
interact mostly with local neighbours. This is also the case for "boids" that simulate flocking
behaviors.

Figure 4.8
Flocking birds are an example of self-organization in biology. Image by Christoffer A
Rasmussen, 2012.

Boids

Note
Excerpt from the Wikipedia entry Boids

Boids is an artificial life program, developed by Craig Reynolds in 1986, which simulates the flocking
behaviour of birds. The name "boid" corresponds to a shortened version of "bird-oid object", which
refers to a bird-like object.
As with most artificial life simulations, Boids is an example of emergent behavior; that is, the
complexity of Boids arises from the interaction of individual agents (the boids, in this case) adhering
to a set of simple rules. The rules applied in the simplest Boids world are as follows:
separation: steer to avoid crowding local flock-mates
alignment: steer towards the average heading of local flock-mates
cohesion: steer to move toward the average position (center of mass) of local flock-mates

Figure 4.9
Boids are governed by a set of three local rules (separation, cohesion and alignment) that serve
as computing velocity and acceleration.

Python implementation
Since each boid is an autonomous entity with several properties such as position and velocity, it
seems natural to start by writing a Boid class:

import math
import random
from vec2 import vec2

class Boid:
def __init__(self, x=0, y=0):
self.position = vec2(x, y)
angle = random.uniform(0, 2*math.pi)
self.velocity = vec2(math.cos(angle), math.sin(angle))
self.acceleration = vec2(0, 0)

The vec2 object is a very simple class that handles all common vector operations with 2
components. It will save us some writing in the main Boid class. Note that there are some vector
packages in the Python Package Index, but that would be overkill for such a simple example.
Boid is a difficult case for regular Python because a boid has interaction with local neighbours.
However, and because boids are moving, to find such local neighbours requires computing at each
time step the distance to each and every other boid in order to sort those which are in a given
interaction radius. The prototypical way of writing the three rules is thus something like:

def separation(self, boids):


count = 0
for other in boids:
d = (self.position - other.position).length()
if 0 < d < desired_separation:
count += 1
...
if count > 0:
...

def alignment(self, boids): ...


def cohesion(self, boids): ...

Full sources are given in the references section below, it would be too long to describe it here and
there is no real difficulty.
To complete the picture, we can also create a Flock object:

class Flock:
def __init__(self, count=150):
self.boids = []
for i in range(count):
boid = Boid()
self.boids.append(boid)

def run(self):
for boid in self.boids:
boid.run(self.boids)

Using this approach, we can have up to 50 boids until the computation time becomes too slow for a
smooth animation. As you may have guessed, we can do much better using numpy, but let me first
point out the main problem with this Python implementation. If you look at the code, you will
certainly notice there is a lot of redundancy. More precisely, we do not exploit the fact that the
Euclidean distance is reflexive, that is, |x − y | = |y − x |. In this naive Python implementation, each rule
2
(function) computes n 2 distances while n would be sufficient if properly cached. Furthermore, each
2
rule re-computes every distance without caching the result for the other functions. In the end, we are
2
computing 3n 2 distances instead of n .
2

Numpy implementation

As you might expect, the numpy implementation takes a different approach and we'll gather all our
boids into a position array and a velocity array:

n = 500
velocity = np.zeros((n, 2), dtype=np.float32)
position = np.zeros((n, 2), dtype=np.float32)

The first step is to compute the local neighborhood for all boids, and for this we need to compute all
paired distances:

dx = np.subtract.outer(position[:, 0], position[:, 0])


dy = np.subtract.outer(position[:, 1], position[:, 1])
distance = np.hypot(dx, dy)

We could have used the scipy cdist but we'll need the dx and dy arrays later. Once those have been
computed, it is faster to use the hypot method. Note that distance shape is (n, n) and each line
relates to one boid, i.e. each line gives the distance to all other boids (including self).
From theses distances, we can now compute the local neighborhood for each of the three rules,
taking advantage of the fact that we can mix them together. We can actually compute a mask for
distances that are strictly positive (i.e. have no self-interaction) and multiply it with other distance
masks.

Note
If we suppose that boids cannot occupy the same position, how can you compute mask_0
more efficiently?

mask_0 = (distance > 0)


mask_1 = (distance < 25)
mask_2 = (distance < 50)
mask_1 *= mask_0
mask_2 *= mask_0
mask_3 = mask_2

Then, we compute the number of neighbours within the given radius and we ensure it is at least 1 to
avoid division by zero.

mask_1_count = np.maximum(mask_1.sum(axis=1), 1)
mask_2_count = np.maximum(mask_2.sum(axis=1), 1)
mask_3_count = mask_2_count

We're ready to write our three rules:


Alignment

# Compute the average velocity of local neighbours


target = np.dot(mask, velocity)/count.reshape(n, 1)

# Normalize the result


norm = np.sqrt((target*target).sum(axis=1)).reshape(n, 1)
target *= np.divide(target, norm, out=target, where=norm != 0)

# Alignment at constant speed


target *= max_velocity

# Compute the resulting steering


alignment = target - velocity

Cohesion

# Compute the gravity center of local neighbours


center = np.dot(mask, position)/count.reshape(n, 1)

# Compute direction toward the center


target = center - position

# Normalize the result


norm = np.sqrt((target*target).sum(axis=1)).reshape(n, 1)
target *= np.divide(target, norm, out=target, where=norm != 0)

# Cohesion at constant speed (max_velocity)


target *= max_velocity

# Compute the resulting steering


cohesion = target - velocity

Separation

# Compute the repulsion force from local neighbours


repulsion = np.dstack((dx, dy))

# Force is inversely proportional to the distance


repulsion = np.divide(repulsion, distance.reshape(n, n, 1)**2, out=repulsion,
where=distance.reshape(n, n, 1) != 0)

# Compute direction away from others


target = (repulsion*mask.reshape(n, n, 1)).sum(axis=1)/count.reshape(n, 1)

# Normalize the result


norm = np.sqrt((target*target).sum(axis=1)).reshape(n, 1)
target *= np.divide(target, norm, out=target, where=norm != 0)

# Separation at constant speed (max_velocity)


target *= max_velocity

You might also like