Maximising MicroPython Speed
Maximising MicroPython Speed
Contents
Algorithms
RAM allocation
Buffers
Floating point
Arrays
Identifying the slowest section of code
MicroPython code improvements
The process of developing high performance code comprises the following stages which
should be performed in the order listed.
Algorithms
The most important aspect of designing any routine for performance is ensuring that the
best algorithm is employed. This is a topic for textbooks rather than for a MicroPython
guide but spectacular performance gains can sometimes be achieved by adopting
algorithms known for their efficiency.
RAM allocation
Buffers
An example of the above is the common case where a buffer is required, such as one
used for communication with a device. A typical driver will create the buffer in the
constructor and use it in its I/O methods which will be called repeatedly.
The MicroPython libraries typically provide support for pre-allocated buffers. For
example, objects which support stream interface (e.g., file or UART)
provide read() method which allocates new buffer for read data, but also
a readinto() method to read data into an existing buffer.
Floating point
Some MicroPython ports allocate floating point numbers on heap. Some other ports may
lack dedicated floating-point coprocessor, and perform arithmetic operations on them in
“software” at considerably lower speed than on integers. Where performance is
important, use integer operations and restrict the use of floating point to sections of the
code where performance is not paramount. For example, capture ADC readings as
integers values to an array in one quick go, and only then convert them to floating-point
numbers for signal processing.
Arrays
Consider the use of the various types of array classes as an alternative to lists.
The array module supports various element types with 8-bit elements supported by
Python’s built in bytes and bytearray classes. These data structures all store elements in
contiguous memory locations. Once again to avoid memory allocation in critical code
these should be pre-allocated and passed as arguments or as bound objects.
When passing slices of objects such as bytearray instances, Python creates a copy which
involves allocation of the size proportional to the size of slice. This can be alleviated
using a memoryview object. memoryview itself is allocated on heap, but is a small, fixed-size
object, regardless of the size of slice it points too.
class foo(object):
def __init__(self):
self.ba = bytearray(100)
def bar(self, obj_display):
ba_ref = self.ba
fb = obj_display.framebuffer
# iterative code using these two objects
This avoids the need repeatedly to look up self.ba and obj_display.framebuffer in the
body of the method bar() .
@micropython.native
def foo(self, arg):
buf = self.linebuf # Cached object
# code
There are certain limitations in the current implementation of the native code emitter.
Like the Native emitter Viper produces machine instructions but further optimisations
are performed, substantially increasing performance especially for integer arithmetic and
bit manipulations. It is invoked using a decorator:
@micropython.viper
def foo(self, arg: int) -> int:
# code
As the above fragment illustrates it is beneficial to use Python type hints to assist the
Viper optimiser. Type hints provide information on the data types of arguments and of
the return value; these are a standard Python language feature formally defined
here PEP0484. Viper supports its own set of types namely int , uint (unsigned
integer), ptr , ptr8 , ptr16 and ptr32 . The ptrX types are discussed below. Currently
the uint type serves a single purpose: as a type hint for a function return value. If such a
function returns 0xffffffff Python will interpret the result as 2**32 -1 rather than as -1.
In addition to the restrictions imposed by the native emitter the following constraints
apply:
@micropython.viper
def foo(self, arg: int) -> int:
buf = ptr8(self.linebuf) # self.linebuf is a bytearray or bytes object
for x in range(20, 30):
bar = buf[x] # Access a data item through the pointer
# code omitted
In this instance the compiler “knows” that buf is the address of an array of bytes; it can
emit code to rapidly compute the address of buf[x] at runtime. Where casts are used to
convert objects to Viper native types these should be performed at the start of the
function rather than in critical timing loops as the cast operation can take several
microseconds. The rules for casting are as follows:
Casting operators are currently: int , bool , uint , ptr , ptr8 , ptr16 and ptr32 .
The result of a cast will be a native Viper variable.
Arguments to a cast can be a Python object or a native Viper variable.
If argument is a native Viper variable, then cast is a no-op (i.e. costs nothing at
runtime) that just changes the type (e.g. from uint to ptr8 ) so that you can
then store/load using this pointer.
If the argument is a Python object and the cast is int or uint , then the
Python object must be of integral type and the value of that integral object is
returned.
The argument to a bool cast must be integral type (boolean or integer); when
used as a return type the viper function will return True or False objects.
If the argument is a Python object and the cast is ptr , ptr , ptr16 or ptr32 ,
then the Python object must either have the buffer protocol (in which case a
pointer to the start of the buffer is returned) or it must be of integral type (in
which case the value of that integral object is returned).
Writing to a pointer which points to a read-only object will lead to undefined behaviour.
The following example illustrates the use of a ptr16 cast to toggle pin X1 n times:
BIT0 = const(1)
@micropython.viper
def toggle_n(n: int):
odr = ptr16(stm.GPIOA + stm.GPIO_ODR)
for _ in range(n):
odr[0] ^= BIT0
A detailed technical description of the three code emitters may be found on Kickstarter
here Note 1 and here Note 2
Code examples in this section are given for the Pyboard. The techniques described
however may be applied to other MicroPython ports too.
This comes into the category of more advanced programming and involves some
knowledge of the target MCU. Consider the example of toggling an output pin on the
Pyboard. The standard approach would be to write
import machine
import stm
Next Previous