0% found this document useful (0 votes)

45 views31 pages

Python 4 HPC

The document summarizes techniques for making Python code more efficient, including: 1) Profiling tools like cProfile and memory_profiler to analyze performance bottlenecks. 2) Using NumPy and SciPy libraries which provide optimized routines for numerical computing. 3) Binding Python to compiled code for improved performance of CPU-intensive tasks. 4) Applying parallelism with modules to exploit multicore processors.

Uploaded by

Bet1nh0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views31 pages

Python 4 HPC

Uploaded by

Bet1nh0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Efﬁcient use of Python

Ariel Lozano

CÉCI training

November 22, 2017

Outline

I Analyze our code with proﬁling tools:

I cpu: cProfile, line_profiler, kernprof
I memory: memory_profiler, mprof
I Being a highly abstract dynamically typed language, how to
make a more efﬁcient use of hardware internals?
I Numpy and Scipy ecosystem (mainly wrappers to C/Fortran
compiled code)
I binding to compiled code: interfaces between python and
compiled modules
I compiling: tools to compile python code
I parallelism: modules to exploit multicores
Sieve of eratostenes
Algorithm to ﬁnd all prime numbers up to any given limit.
Ex: Find all the prime numbers less than or equal to 25:
I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Cross out every number displaced by 2 after 2 up to the limit:
I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Move to next n non crossed, cross out each non crossed
number displaced by n:
I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
The remaining numbers non crossed in the list are all the
primes below limit.
2
√ to n to start crossing out.
Trivial optimization: jump directly
Then, n must loop only up to limit.
Simple python implementation

def primes_upto ( limit ) :

sieve = [ False ] * 2 + [ True ] * ( limit - 1 )
for n in xrange (2 , int ( limit ** 0 . 5 + 1 ) ) :
if sieve [ n ] :
i = n ** 2
while i < limit + 1 :
sieve [ i ] = False
i += n
return [ i for i , prime in enumerate ( sieve ) if prime ]

if name = = " main " :

primes = primes_upto ( 25 )
print ( primes )

$ python sieve01.py

[2, 3, 5, 7, 11, 13, 17, 19, 23]

Measuring running time

Computing primes up to 30 000 000:

I linux time command
$ time python sieve01.py

real 0m10.419s
user 0m10.192s
sys 0m0.217s
I using timeit module to average several runs
$ python -m timeit -n 3 -r 3 -s "import sieve01" \
> "sieve01.primes_upto(30000000)"

3 loops, best of 3: 10.2 sec per loop

CPU proﬁling: timing functions

cProfile: built-in proﬁling tool in the standard library. It hooks

into the virtual machine to measure the time taken to run every
function that it sees.
$ python -m cProfile -s cumulative sieve01.py
5 function calls in 10.859 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)

1 0.000 0.000 10.859 10.859 {built-in method builtins.exec}
1 0.087 0.087 10.859 10.859 sieve01.py:3(<module>)
1 9.447 9.447 10.772 10.772 sieve01.py:3(primes_upto)
1 1.325 1.325 1.325 1.325 sieve01.py:11(<listcomp>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler

I Useful information but for big codes we will need extra

tools to visualize the dumps
CPU proﬁling: line by line details of a function
line_profiler: proﬁling individual functions on a line-by-line
basis, big overhead introduced. We must add the @profile
decorator on the function to be analyzed.
@profile
def primes_upto ( limit ) :
sieve = [ False ] * 2 + [ True ] * ( limit - 1 )
for n in xrange (2 , int ( limit ** 0 . 5 + 1 ) ) :
if sieve [ n ] :
i = n ** 2
while i < limit + 1 :
sieve [ i ] = False
i += n
return [ i for i , prime in enumerate ( sieve ) if prime ]

if name = = " main " :

primes = primes_upto ( 30000000 )

Then, we run the code with the kernprof.py script provided by

the package.
CPU proﬁling: line by line details of a function
$ kernprof -l -v sieve01_prof.py
Wrote profile results to sieve01_prof.py.lprof
Timer unit: 1e-06 s

Total time: 101.025 s

File: sieve01_prof.py
Function: primes_upto at line 2

Line # Hits Time Per Hit % Time Line Contents

==============================================================
2 @profile
3 def primes_upto(limit):
4 1 415906 415906.0 0.4 sieve = [False] * 2 + [True] *
5 5477 2307 0.4 0.0 for n in xrange(2, int(limit**0
6 5476 2362 0.4 0.0 if sieve[n]:
7 723 680 0.9 0.0 i = n**2
8 70634832 28740579 0.4 28.4 while i < limit+1:
9 70634109 33142484 0.5 32.8 sieve[i] = False
10 70634109 26776815 0.4 26.5 i += n
11 30000002 11943768 0.4 11.8 return [i for i, prime in enume

% Time is relative to each function only, not to total running time.

Memory proﬁling: line by line details of a function

memory_profiler: module to measure memory usage on a

line-by-line basis, runs will be slower than line_profiler. Is
also required the @profile decorator on the function to analyze.
$ python -m memory_profiler sieve01_prof.py
Filename: sieve01_prof.py

Line # Mem usage Increment Line Contents

================================================
2 32.715 MiB 0.000 MiB @profile
3 def primes_upto(limit):
4 261.703 MiB 228.988 MiB sieve = [False] * 2 + [True] * (limit - 1)
5 261.703 MiB 0.000 MiB for n in xrange(2, int(limit**0.5 + 1)):
6 261.703 MiB 0.000 MiB if sieve[n]:
7 261.703 MiB 0.000 MiB i = n**2
8 261.703 MiB 0.000 MiB while i < limit+1:
9 261.703 MiB 0.000 MiB sieve[i] = False
10 261.703 MiB 0.000 MiB i += n
11 return [i for i, prime in enumerate(sieve) if
Memory proﬁling: line by line details of a function

Why are 228 MB allocated on this line?

4 261.703 MiB 228.988 MiB sieve = [False] * 2 + [True] * (limit - 1)

I In a Python list each boolean variable has a size of 8 bytes.

The standard for a C long int in 64-bits.
I We are creating a list with 30000002 elements.
30000002∗8
I Doing the math: 1024∗1024 = 228.881
Memory proﬁling: analyzing the whole run vs time

I Line by line analysis introduces a huge overhead, can be up

to 100x slower
I We can miss information due to many
allocations/deallocations taking place on a single line
I The memory_profiler package provides the mprof tool to
analyze and visualize the memory usage as a function of
time
I It has a very minor impact on the running time
I Usage:
$ mprof run --python python2 mycode.py
$ mprof plot
Memory proﬁling: analyzing the whole run vs time
$ mprof run --python python2 sieve01.py
$ mprof plot
Memory proﬁling: analyzing the whole run vs time

We can add a @profile decorator and timestamps to introduce

details in the analysis.
@profile
def primes_upto ( limit ) :
with profile . timestamp ( " create_sieve_list " ) :
sieve = [ False ] * 2 + [ True ] * ( limit - 1 )
with profile . timestamp ( " cross_out_sieve " ) :
for n in xrange (2 , int ( limit ** 0 . 5 + 1 ) ) :
if sieve [ n ] :
i = n ** 2
while i < limit + 1 :
sieve [ i ] = False
i += n
return [ i for i , prime in enumerate ( sieve ) if prime ]
Memory profiling: analyzing the whole run vs time
$ mprof run --python python2 sieve01_memprof.py
$ mprof plot
Memory profiling: analyzing the whole run vs time
Why the 500 MB peak during the sieve list creation?
I Experimenting with the mprof tool can be verified that:
sieve = [ False ] * 2 + [ True ] * ( limit - 1 )

I is actually equivalent to something like:

sieve1 = [ False ] * 2
sieve2 = [ True ] * ( limit - 1 )
sieve = sieve1 + sieve2
del sieve1
del sieve2

I is allocated temporarily an extra ≈ 30E6 boolean list !!

I We can try to replace with:
sieve = [ True ] * ( limit + 1 )
sieve [ 0 ] = False
sieve [ 1 ] = False
Memory proﬁling: analyzing the whole run vs time
$ mprof run --python python2 sieve02_memprof.py
$ mprof plot
Numpy library

I Provides a new kind of array datatype

I Contains methods for fast operations on entire arrays
without having to write loops
I They are basically wrappers to compiled C/Fortran/C++
code
I Runs almost as quickly as C
I It is the foundation of many other higher-level numerical
tools
I Compares to MATLAB in functionality
Numpy library: matrix vector product

>>> import numpy as np

>>> a = np . array ( [ [ 5 , 1 ,3 ] ,
[ 1 , 1 ,1 ] ,
[ 1 , 2 ,1 ] ] )
>>> b = np . array ( [1 , 2 , 3 ] )
>>> c = a . dot ( b )
array ( [ 16 , 6 , 8 ] )
Numpy library: sieve revisited

We replace the sieve list with a Numpy boolean array:

import numpy as np

def primes_upto ( limit ) :

sieve = np . ones ( limit + 1 , dtype = np . bool )
sieve [ 0 ] = False
sieve [ 1 ] = False
for n in xrange (2 , int ( limit ** 0 . 5 + 1 ) ) :
if sieve [ n ] :
i = n ** 2
while i < limit + 1 :
sieve [ i ] = False
i += n
return [ i for i , prime in enumerate ( sieve ) if prime ]
Numpy library: sieve revisited

I In a Numpy array each boolean has a size of 1 byte

I Math now: 30000002∗1
1024∗1024 = 28.61
Numpy library: sieve revisited
I Timing did not improve with Numpy array and same loop
I Fully Numpy solution using slice indexing to iterate:

import numpy as np

def primes_upto ( limit ) :

sieve = np . ones ( limit + 1 , dtype = np . bool )
sieve [ 0 ] = False
sieve [ 1 ] = False
for n in xrange (2 , int ( limit ** 0 . 5 + 1 ) ) :
if sieve [ n ] :
sieve [ n * n : : n ] = 0
return np . nonzero ( sieve ) [ 0 ]

$ time python2 sieve04_np.py

real 0m0.552s
user 0m0.518s
sys 0m0.033s

I 22x gain in time!!

Numpy library: sieve line by line proﬁling
$ kernprof -l -v sieve04_np_prof.py
Wrote profile results to sieve04_np_prof.py.lprof
Timer unit: 1e-06 s

Total time: 0.482723 s

File: sieve04_np_prof.py
Function: primes_upto at line 3

Line # Hits Time Per Hit % Time Line Contents

==============================================================
3 @profile
4 def primes_upto(limit):
5 1 8785 8785.0 1.8 sieve = np.ones(limit
6 1 5 5.0 0.0 sieve[0] = False
7 1 0 0.0 0.0 sieve[1] = False
8 5477 2796 0.5 0.6 for n in xrange(2, int
9 5476 3119 0.6 0.6 if sieve[n]:
10 723 420784 582.0 87.2 sieve[n**2::n]
11 1 47234 47234.0 9.8 return np.nonzero(siev
Numpy library: sieve line by line proﬁling

I line_profiler helps to understand the massive gain

I Pure python solution:
6 5476 2362 0.4 0.0 if sieve[n]:
7 723 680 0.9 0.0 i = n**2
8 70634832 28740579 0.4 28.4 while i < limit+1:
9 70634109 33142484 0.5 32.8 sieve[i] = False
10 70634109 26776815 0.4 26.5 i += n
I Full Numpy solution:
9 5476 3119 0.6 0.6 if sieve[n]:
10 723 420784 582.0 87.2 sieve[n**2::n] = 0
I The loops to cross out the sieve are fully performed by
lower level implementations in Numpy
I Time and memory usage is the same as C or Fortran
compiled solutions !
CPU and Memory proﬁling: summary

I Line-by-line proﬁling introduces a huge overhead, they

must be used reducing the problem size and for specific
functions detected as bottlenecks
I The mprof tool is very dynamic, timestammping in a smart
way can be used both as a fast CPU and Memory profiler
I The cProfile dumps are great to detect bottlenecks on big
projects, but a visualization tool is almost mandatory.
Explore the KCachegrind package, usual workflow:
$ python -m cProfile -o prof.out sieve02.py
$ pyprof2calltree -i prof.out -k
Numpy library: SciPy ecosystem
Collection of open source software for scientific computing in
Python
I Core packages:
I NumPy: the fundamental package for numerical computation
I SciPy library: collection of numerical algorithms and
domain-specific toolboxes, including signal processing, fourier
transforms, clustering, optimization, statistics...
I Matplotlib: a mature plotting package, provides publication-quality
2D plotting as well as rudimentary 3D plotting
I Data and computation:
I pandas: providing high-performance, easy to use data structures
(similar to R)
I SymPy: symbolic mathematics and computer algebra
I scikit-image: algorithms for image processing
I scikit-learn: algorithms and tools for machine learning
I h5py and PyTables: can both access data stored in the HDF5
format
Python Bindings

I Interfacing python with compiled code can provide huge

performance gains
I f2py: project to provide a connection between Python and
Fortran languages
I weave: tools for including C/C++ code within Python code
I cfﬁ (C Foreign Function Interface for Python): Interact with
almost any C code from Python.
I ctypes: foreign function library for Python. It provides C
compatible data types and allows calling functions in DLLs
or shared libraries.
Python Bindings: f2py example

subroutine foo ( a ) import hello

integer a
print * , " Hello from Fortran ! " if __name__ = = " __main__ " :
print * , " a = " ,a
end hello . foo ( 10 )

$ f2py2 -c -m hello hello.f90 $ python2 call_fhello.py

Hello from Fortran!
a= 10
Compiled Python
There are also tools to compile python code
I cython: C-Extensions for Python
I optimising and static compiler
I can compile Python code and Cython language
I can compile Python with Numpy code
I can do bindings with C code
I Pypy: Just-in-time compiler
I sometimes less memory hungry than Cython
I not fully compliant with Python with Numpy code
I Numba: a compiler specialized for numpy code using the
LLVM compiler
I Pythran: compiler for both numpy and non-numpy code.
Takes advantage of multi-cores and single instruction
multiple data (SIMD) units
I All, except pypy requires to modify or decorate the original
python code
Compiled Python: pypy

I We can directly run the original sieve01.py with pypy

$ time pypy sieve01.py

real 0m2.593s
user 0m2.222s
sys 0m0.294s
Parallel processing

I multiprocessing module
I allows to use process- and thread-based parallel processing
I allows to share memory among processes
I constrained to single-machine multicore parallelism
I mpi4py
I Python bindings to the MPI-1/2/3 interfaces
I if you know MPI on C/Fortran you already know mpi4py
I can make use equivalently of multiple cores on a single-machine
or distributed
I each process has a separate address space, no possibility to
share memory between them
I we covered it in the MPI session
Further information on the topic

I High Performance Python by By Micha Gorelick and Ian

Ozsvald
I Python in HPC Tutorial:
https://github.com/pyHPC/pyhpc-tutorial

NB 10
0% (1)
NB 10
24 pages
Samsung All-In-One Security System: Front
No ratings yet
Samsung All-In-One Security System: Front
14 pages
Euler Project
No ratings yet
Euler Project
27 pages
08 - Mixedprogramming: 1 Mixed Programming
No ratings yet
08 - Mixedprogramming: 1 Mixed Programming
41 pages
Numpy
No ratings yet
Numpy
28 pages
2021-civ-73-part-2 python
No ratings yet
2021-civ-73-part-2 python
19 pages
python paper solution
No ratings yet
python paper solution
12 pages
Tareas para la asignatura Python 1 de la THD
No ratings yet
Tareas para la asignatura Python 1 de la THD
20 pages
5. Numpy I
No ratings yet
5. Numpy I
99 pages
Python (3) Leaflet: Roland Becker December 16, 2020
No ratings yet
Python (3) Leaflet: Roland Becker December 16, 2020
15 pages
Sec - Program Document
No ratings yet
Sec - Program Document
8 pages
Python_Numpy
No ratings yet
Python_Numpy
20 pages
Eecs 1015 Lab 8
No ratings yet
Eecs 1015 Lab 8
17 pages
Mode of Exam
No ratings yet
Mode of Exam
6 pages
Ch11a Numpy
No ratings yet
Ch11a Numpy
8 pages
Numpy
No ratings yet
Numpy
71 pages
Parallel Aristoteles Sieve
No ratings yet
Parallel Aristoteles Sieve
2 pages
Assdadasx CC C
No ratings yet
Assdadasx CC C
46 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
39 pages
PyCon India 2012 Talk
No ratings yet
PyCon India 2012 Talk
21 pages
Python Interview
No ratings yet
Python Interview
77 pages
Numerical Methods For Engineers ch1
No ratings yet
Numerical Methods For Engineers ch1
85 pages
Python Numpy Programming: Eliot Feibush
No ratings yet
Python Numpy Programming: Eliot Feibush
66 pages
python lab manual
No ratings yet
python lab manual
20 pages
numpy
No ratings yet
numpy
7 pages
Time and Space Complexity _ 2023
No ratings yet
Time and Space Complexity _ 2023
47 pages
Class_X_Practical_File_Work
No ratings yet
Class_X_Practical_File_Work
21 pages
SageMath Lecture 8
No ratings yet
SageMath Lecture 8
14 pages
Operations
No ratings yet
Operations
28 pages
Lab 2
No ratings yet
Lab 2
11 pages
Python Interview Questions and Answers With Code # Python
No ratings yet
Python Interview Questions and Answers With Code # Python
15 pages
Efficient Prime Number Generating Algorithms
No ratings yet
Efficient Prime Number Generating Algorithms
20 pages
Week 3
No ratings yet
Week 3
27 pages
pa5-24
No ratings yet
pa5-24
6 pages
Lci2022028 La
No ratings yet
Lci2022028 La
7 pages
John V. Guttag - Introduction To Computation and Programming Using Python - With Application To Understanding Data-The MIT Press (2016) PDF
100% (1)
John V. Guttag - Introduction To Computation and Programming Using Python - With Application To Understanding Data-The MIT Press (2016) PDF
17 pages
Assignment 4 Utkarsh
No ratings yet
Assignment 4 Utkarsh
7 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
python_lab_A_B
No ratings yet
python_lab_A_B
13 pages
Multitreading
No ratings yet
Multitreading
2 pages
Optional Assignment 1
No ratings yet
Optional Assignment 1
11 pages
Python_1
No ratings yet
Python_1
18 pages
easy level practice prgms
No ratings yet
easy level practice prgms
12 pages
Python Warm Up
No ratings yet
Python Warm Up
9 pages
44DBD3C517136789270
No ratings yet
44DBD3C517136789270
21 pages
What Will Be The Output For The Following Code
No ratings yet
What Will Be The Output For The Following Code
19 pages
Num Py
No ratings yet
Num Py
46 pages
Numpy
No ratings yet
Numpy
4 pages
FDS_LAB_MANUAL-1
No ratings yet
FDS_LAB_MANUAL-1
51 pages
JDqsIfL5CDJNengyuAdbtw4yDjTj0JE9muQxJ1QxoTMHhl1hQFE45NBu
No ratings yet
JDqsIfL5CDJNengyuAdbtw4yDjTj0JE9muQxJ1QxoTMHhl1hQFE45NBu
32 pages
Python (2)
No ratings yet
Python (2)
25 pages
Num Py Project Euler 3
No ratings yet
Num Py Project Euler 3
2 pages
All Important Questions in Python For o Level Exam
No ratings yet
All Important Questions in Python For o Level Exam
12 pages
Unit - V
No ratings yet
Unit - V
29 pages
Python Introduction
No ratings yet
Python Introduction
20 pages
Lec 2
No ratings yet
Lec 2
58 pages
Decap776 P 1
No ratings yet
Decap776 P 1
6 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Smartform IQ
No ratings yet
Smartform IQ
17 pages
An Object-Oriented Model For Implementing Cosequential Processes
No ratings yet
An Object-Oriented Model For Implementing Cosequential Processes
26 pages
BES 114 Exercise - Activity 1
No ratings yet
BES 114 Exercise - Activity 1
7 pages
HCI Design Principles
No ratings yet
HCI Design Principles
32 pages
Job Application: Newspaper Advertisement Job Websites Email Walk-In Others
No ratings yet
Job Application: Newspaper Advertisement Job Websites Email Walk-In Others
6 pages
HP-UX Containers (SRP) A.03.01 Administrator's Guide PDF
No ratings yet
HP-UX Containers (SRP) A.03.01 Administrator's Guide PDF
7 pages
Hitashi Patel Resume..
No ratings yet
Hitashi Patel Resume..
3 pages
Anderson Et Al Edition 3 PDF
No ratings yet
Anderson Et Al Edition 3 PDF
122 pages
Arduino Ethernet Shield Web Server Tutorial
No ratings yet
Arduino Ethernet Shield Web Server Tutorial
12 pages
What Is Hadoop
No ratings yet
What Is Hadoop
162 pages
Focused Use Cases
No ratings yet
Focused Use Cases
19 pages
Web Application Penetration Testing Checklist: More Than 200 Custom Test Cases Prepared By: Tushar Verma Recon Phase
100% (1)
Web Application Penetration Testing Checklist: More Than 200 Custom Test Cases Prepared By: Tushar Verma Recon Phase
13 pages
academic submission cover page kritika kapoor
No ratings yet
academic submission cover page kritika kapoor
9 pages
Introduction To Python: Arun Kumar
No ratings yet
Introduction To Python: Arun Kumar
41 pages
Application and Risk Analysis-2023-04-28-0959 - 9093
No ratings yet
Application and Risk Analysis-2023-04-28-0959 - 9093
22 pages
Rfpitmswithmarketrisk
No ratings yet
Rfpitmswithmarketrisk
249 pages
Swupd Bundle-Add Clearlinux-On-Wsl2
No ratings yet
Swupd Bundle-Add Clearlinux-On-Wsl2
6 pages
AF5122 - Course Introduction
No ratings yet
AF5122 - Course Introduction
19 pages
Important_Questions_BCS 101
No ratings yet
Important_Questions_BCS 101
2 pages
Modern Work Plan Comparison Enterprise EEA
No ratings yet
Modern Work Plan Comparison Enterprise EEA
10 pages
TM440TRE.00-ENG - Motion Control - Basic Functions - V4000
No ratings yet
TM440TRE.00-ENG - Motion Control - Basic Functions - V4000
40 pages
Espaciales-Con-Python-Geopandas: Matplotlib
No ratings yet
Espaciales-Con-Python-Geopandas: Matplotlib
6 pages
CNC Lathe Machining Catia
No ratings yet
CNC Lathe Machining Catia
82 pages
Faq Online Guard Plus
No ratings yet
Faq Online Guard Plus
4 pages
Fragments
No ratings yet
Fragments
6 pages
INTELLINET - Wireless 300N 3G Router Quick Install
No ratings yet
INTELLINET - Wireless 300N 3G Router Quick Install
16 pages
SiteManager 1149-3349 Brochure
No ratings yet
SiteManager 1149-3349 Brochure
3 pages
Ipc 2512a Eng
No ratings yet
Ipc 2512a Eng
18 pages
BAPIS Scenarios Where BAPI Required 1745837246
No ratings yet
BAPIS Scenarios Where BAPI Required 1745837246
5 pages

Python 4 HPC

Uploaded by

Python 4 HPC

Uploaded by

Efﬁcient use of Python

November 22, 2017

I Analyze our code with proﬁling tools:

def primes_upto ( limit ) :

if __name__ = = " __main__ " :

[2, 3, 5, 7, 11, 13, 17, 19, 23]

Computing primes up to 30 000 000:

3 loops, best of 3: 10.2 sec per loop

cProfile: built-in proﬁling tool in the standard library. It hooks

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)

I Useful information but for big codes we will need extra

if __name__ = = " __main__ " :

primes = primes_upto ( 30000000 )

Then, we run the code with the kernprof.py script provided by

Total time: 101.025 s

Line # Hits Time Per Hit % Time Line Contents

% Time is relative to each function only, not to total running time.

memory_profiler: module to measure memory usage on a

Line # Mem usage Increment Line Contents

Why are 228 MB allocated on this line?

I In a Python list each boolean variable has a size of 8 bytes.

I Line by line analysis introduces a huge overhead, can be up

We can add a @profile decorator and timestamps to introduce

I is actually equivalent to something like:

I is allocated temporarily an extra ≈ 30E6 boolean list !!

I Provides a new kind of array datatype

>>> import numpy as np

We replace the sieve list with a Numpy boolean array:

def primes_upto ( limit ) :

I In a Numpy array each boolean has a size of 1 byte

def primes_upto ( limit ) :

$ time python2 sieve04_np.py

I 22x gain in time!!

Total time: 0.482723 s

Line # Hits Time Per Hit % Time Line Contents

I line_profiler helps to understand the massive gain

I Line-by-line proﬁling introduces a huge overhead, they

I Interfacing python with compiled code can provide huge

subroutine foo ( a ) import hello

$ f2py2 -c -m hello hello.f90 $ python2 call_fhello.py

I We can directly run the original sieve01.py with pypy

I High Performance Python by By Micha Gorelick and Ian

You might also like

if name = = " main " :

if name = = " main " :