Mastering Python High Performance: Learn how to optimize your code and Python performance with this vital guide to Python performance profiling and benchmarking
()
About this ebook
Fernando Donglio
Fernando Doglio has been working as a web developer for the past 10 years. During that time, he shifted his focus to the Web and grabbed the opportunity of working with most of the leading technologies, such as PHP, Ruby on Rails, MySQL, Python, Node.js, AngularJS, AJAX, REST APIs, and so on. In his spare time, Fernando likes to tinker and learn new things. This is why his GitHub account keeps getting new repos every month. He's also a big open source supporter and tries to win the support of new people with the help of his website, lookingforpullrequests.com. You can reach him on Twitter at @deleteman123. When he is not programming, he spends time with his family.
Related to Mastering Python High Performance
Related ebooks
Modular Programming with Python Rating: 0 out of 5 stars0 ratingsPython for Secret Agents Rating: 0 out of 5 stars0 ratingsConceptual Programming with Python Rating: 4 out of 5 stars4/5Learning Cython Programming - Second Edition Rating: 0 out of 5 stars0 ratingsParallel Programming with Python Rating: 0 out of 5 stars0 ratingsLearning NumPy Array Rating: 0 out of 5 stars0 ratingsEffective Python Penetration Testing Rating: 0 out of 5 stars0 ratingsFlask Blueprints: Dive into the world of the Flask microframework to develop an array of web applications Rating: 0 out of 5 stars0 ratingsPython Tools for Visual Studio Rating: 0 out of 5 stars0 ratingsKivy – Interactive Applications and Games in Python - Second Edition Rating: 0 out of 5 stars0 ratingsWeb Scraping with Python Rating: 4 out of 5 stars4/5Express Web Application Development Rating: 3 out of 5 stars3/5Learn Python in 7 Days Rating: 0 out of 5 stars0 ratingsPython Unlocked Rating: 0 out of 5 stars0 ratingsNW.js Essentials Rating: 0 out of 5 stars0 ratingsBackTrack 4: Assuring Security by Penetration Testing Rating: 5 out of 5 stars5/5Large Scale Machine Learning with Python Rating: 2 out of 5 stars2/5Advanced Machine Learning with Python Rating: 0 out of 5 stars0 ratingsPython High Performance - Second Edition Rating: 0 out of 5 stars0 ratingsVisual Studio Code for Python Programmers Rating: 0 out of 5 stars0 ratingsPractical C++ Backend Programming Rating: 0 out of 5 stars0 ratingsFennel Explained: A Lisp for Lua in Game Development and Embedding Rating: 0 out of 5 stars0 ratingsDjango 1.1 Testing and Debugging Rating: 4 out of 5 stars4/5
Reviews for Mastering Python High Performance
0 ratings0 reviews
Book preview
Mastering Python High Performance - Fernando Donglio
Table of Contents
Mastering Python High Performance
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Profiling 101
What is profiling?
Event-based profiling
Statistical profiling
The importance of profiling
What can we profile?
Execution time
Where are the bottlenecks?
Memory consumption and memory leaks
The risk of premature optimization
Running time complexity
Constant time – O(1)
Linear time – O(n)
Logarithmic time – O(log n)
Linearithmic time – O(nlog n)
Factorial time – O(n!)
Quadratic time – O(n^)
Profiling best practices
Build a regression-test suite
Mind your code
Be patient
Gather as much data as you can
Preprocess your data
Visualize your data
Summary
2. The Profilers
Getting to know our new best friends: the profilers
cProfile
A note about limitations
The API provided
The Stats class
Profiling examples
Fibonacci again
Tweet stats
line_profiler
kernprof
Some things to consider about kernprof
Profiling examples
Back to Fibonacci
Inverted index
getOffsetUpToWord
getWords
list2dict
readFileContent
saveIndex
__start__
getOffsetUpToWord
getWords
list2dict
saveIndex
Summary
3. Going Visual – GUIs to Help Understand Profiler Output
KCacheGrind – pyprof2calltree
Installation
Usage
A profiling example – TweetStats
A profiling example – Inverted Index
RunSnakeRun
Installation
Usage
Profiling examples – the lowest common multiplier
A profiling example – search using the inverted index
Summary
4. Optimize Everything
Memoization / lookup tables
Performing a lookup on a list or linked list
Simple lookup on a dictionary
Binary search
Use cases for lookup tables
Usage of default arguments
List comprehension and generators
ctypes
Loading your own custom C library
Loading a system library
String concatenation
Other tips and tricks
Summary
5. Multithreading versus Multiprocessing
Parallelism versus concurrency
Multithreading
Threads
Creating a thread with the thread module
Working with the threading module
Interthread communication with events
Multiprocessing
Multiprocessing with Python
Exit status
Process pooling
Interprocess communication
Pipes
Events
Summary
6. Generic Optimization Options
PyPy
Installing PyPy
A Just-in-time compiler
Sandboxing
Optimizing for the JIT
Think of functions
Consider using cStringIO to concatenate strings
Actions that disable the JIT
Code sample
Cython
Installing Cython
Building a Cython module
Calling C functions
Solving naming conflicts
Defining types
Defining types during function definitions
A Cython example
When to define a type
Limitations
Generator expressions
Comparison of char* literals
Tuples as function arguments
Stack frames
How to choose the right option
When to go with Cython
When to go with PyPy
Summary
7. Lightning Fast Number Crunching with Numba, Parakeet, and pandas
Numba
Installation
Using Numba
Numba's code generation
Eager compilation
Other configuration settings
No GIL
NoPython mode
Running your code on the GPU
The pandas tool
Installing pandas
Using pandas for data analysis
Parakeet
Installing Parakeet
How does Parakeet work?
Summary
8. Putting It All into Practice
The problem to solve
Getting data from the Web
Postprocessing the data
The initial code base
Analyzing the code
Scraper
Analyzer
Summary
Index
Mastering Python High Performance
Mastering Python High Performance
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: September 2015
Production reference: 1030915
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-930-0
www.packtpub.com
Credits
Author
Fernando Doglio
Reviewers
Erik Allik
Mike Driscoll
Enrique Escribano
Mosudi Isiaka
Commissioning Editor
Kunal Parikh
Acquisition Editors
Vivek Anantharaman
Richard Brookes-Bland
Content Development Editors
Akashdeep Kundu
Rashmi Suvarna
Technical Editor
Vijin Boricha
Copy Editors
Relin Hedly
Karuna Narayanan
Project Coordinator
Milton Dsouza
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Sheetal Aute
Production Coordinator
Arvindkumar Gupta
Cover Work
Arvindkumar Gupta
About the Author
Fernando Doglio has been working as a web developer for the past 10 years.
During that time, he shifted his focus to the Web and grabbed the opportunity of working with most of the leading technologies, such as PHP, Ruby on Rails, MySQL, Python, Node.js, AngularJS, AJAX, REST APIs, and so on.
In his spare time, Fernando likes to tinker and learn new things. This is why his GitHub account keeps getting new repos every month. He's also a big open source supporter and tries to win the support of new people with the help of his website, lookingforpullrequests.com.
You can reach him on Twitter at @deleteman123.
When he is not programming, he spends time with his family.
I'd like to thank my lovely wife for putting up with me and the long hours I spent writing this book; this book would not have been possible without her continued support. I would also like to thank my two sons. Without them, this book would've been finished months earlier.
Finally, I'd like to thank the reviewers and editors. They helped me get this book in shape and achieve the quality level that you deserve.
About the Reviewers
Erik Allik is a self-taught multilingual, multiparadigm full-stack software engineer. He started programming at the age of 14. Since then, Erik has been working with many programming languages (both imperative and functional) and various web and non-web-related technologies.
He has worked primarily with Python, Scala, and JavaScript. Erik is currently focusing on applying Haskell and other innovative functional programming techniques in various industries and leveraging the power of a mathematical approach and formalism in the wild.
Mike Driscoll has been programming in Python since 2006. He enjoys writing about Python on his blog at http://www.blog.pythonlibrary.org/. Mike has coauthored Core Python refcard for DZone. He recently authored Python 101 and was a technical reviewer for the following books by Packt Publishing:
Python 3 Object-Oriented Programming
Python 2.6 Graphics Cookbook
Tkinter GUI Application Development Hotshot
I would like to thank my beautiful wife, Evangeline, for supporting me throughout. I would also like to thank my friends and family for all their help. Also, thank you Jesus Christ for taking good care of me.
Enrique Escribano lives in Chicago and is working as a software engineer at Nokia. Although he is just 23 years old, he holds a master's of computer science degree from IIT (Chicago) and a master's of science degree in telecommunication engineering from ETSIT-UPM (Madrid). Enrique has also worked as a software engineer at KeepCoding and as a developer intern at Telefonica, SA, the most important Spanish tech company.
He is an expert in Java and Python and is proficient in using C/C++. Most of his projects involve working with cloud-based technologies, such as AWS, GAE, Hadoop, and so on. Enrique is also working on an open source research project based on security with software-defined networking (SDN) with professor Dong Jin at IIT Security Lab.
You can find more information about Enrique on his personal website at enriquescribano.com. You can also reach him on LinkedIn at linkedin.com/in/enriqueescribano.
I would like to thank my parents, Lucio and Carmen, for all the unconditional support they have provided me with over the years. They allowed me to be as ambitious as I wanted. Without them, I may never have gotten to where I am today.
I would like to thank my siblings, Francisco and Marta. Being the eldest brother is challenging, but you both keep inspiring me everyday.
Lastly, I would also like to thank Paula for always being my main inspiration and motivation since the very first day. I am so fortunate to have her in my life.
Mosudi Isiaka is a graduate in electrical and computer engineering from the Federal University of Technology Minna, Niger State, Nigeria. He demonstrates excellent skills in numerous aspects of information and communication technology. From a simple network to a mid-level complex network scenario of no less than one thousand workstations (Microsoft Windows 7, Microsoft Windows Vista, and Microsoft Windows XP), along with a Microsoft Windows 2008 Server R2 Active Directory domain controller deployed in more than a single location, Mosudi has extensive experience in implementing and managing a local area network. He has successfully set up a data center infrastructure, VPN, WAN link optimization, firewall and intrusion detection system, web/e-mail hosting control panel, OpenNMS network management application, and so on.
Mosudi has the ability to use open source software and applications to achieve enterprise-level network management solutions in scenarios that cover a virtual private network (VPN), IP PBX, cloud computing, clustering, virtualization, routing, high availability, customized firewall with advanced web filtering, network load balancing, failover and link aggregation for multiple Internet access solutions, traffic engineering, collaboration suits, network-attached storage (NAS), Linux systems administration, virtual networking and computing.
He is currently employed as a data center manager at One Network Ltd., Nigeria. Mosudi also works with ServerAfrica(http://www.serverafrica.com) as a managing consultant (technicals).
You can find more information about him at http://www.mioemi.com. You can also reach him at http://ng.linkedin.com/pub/isiaka-mosudi/1b/7a2/936/.
I would like to thank my amiable wife, Mosudi Efundayo Coker, for her moral support.
Also, many thanks to my colleague, Oyebode Micheal Tosin, for his timely reminders and technical suggestions during the reviewing process.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
Support files, eBooks, discount offers, and morehttps://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Preface
The idea of this book came to me from the nice people at Packt Publishing. They wanted someone who could delve into the intricacies of high performance in Python and everything related to this subject, be it profiling, the available tools (such as profilers and other performance enhancement techniques), or even alternatives to the standard Python implementation.
Having said that, I welcome you to Mastering Python High Performance. In this book, we'll cover everything related to performance improvements. Knowledge about the subject is not strictly required (although it won't hurt), but knowledge of the Python programming language is required, especially in some of the Python-specific chapters.
We'll start by going through the basics of what profiling is, how it fits into the development cycle, and the benefits related to including this practice in it. Afterwards, we'll move on to the core tools required to get the job done (profilers and visual profilers). Then, we will take a look at a set of optimization techniques and finally arrive at a fully practical chapter that will provide a real-life optimization example.
What this book covers
Chapter 1, Profiling 101, provides information about the art of profiling to those who are not aware of it.
Chapter 2, The Profilers, tells you how to use the core tools that will be mentioned throughout the book.
Chapter 3, Going Visual – GUIs to Help Understand Profiler Output, covers how to use the pyprof2calltree and RunSnakeRun tools. It also helps the developer to understand the output of cProfile with different visualization techniques.
Chapter 4, Optimize Everything, talks about the basic process of optimization and a set of good/recommended practices that every Python developer should follow before considering other options.
Chapter 5, Multithreading versus Multiprocessing, discusses multithreading and multiprocessing and explains how and when to apply them.
Chapter 6, Generic Optimization Options, describes and shows you how to install and use Cython and PyPy in order to improve code performance.
Chapter 7, Lightning Fast Number Crunching with Numba, Parakeet, and pandas, talks about tools that help optimize Python scripts that deal with numbers. These specific tools (Numba, Parakeet, and pandas) help make number crunching faster.
Chapter 8, Putting It All into Practice, provides a practical example of profilers, finds its bottlenecks, and removes them using the tools and techniques mentioned in this book. To conclude, we'll compare the results of using each technique.
What you need for this book
Your system must have the following software before executing the code mentioned in this book:
Python 2.7
Line profiler 1.0b2
Kcachegrind 0.7.4
RunSnakeRun 2.0.4
Numba 0.17
The latest version of Parakeet
pandas 0.15.2
Who this book is for
Since the topics tackled in this book cover everything related to profiling and optimizing the Python code, Python developers at all levels will benefit from this book.
The only essential requirement is to have some basic knowledge of the Python programing language.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can print/gather the information we deem relevant inside the PROFILER function.
A block of code is set as follows:
import sys
def profiler(frame, event, arg):
print 'PROFILER: %r %r' % (event, arg)
sys.setprofile(profiler)
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
Traceback (most recent call last):
File cprof-test1.py
, line 7, in
runRe() ...
File /usr/lib/python2.7/cProfile.py
, line 140, in runctx
exec cmd in globals, locals
File
, line 1, in
NameError: name 're' is not defined
Any command-line input or output is written as follows:
$ sudo apt-get install python-dev libxml2-dev libxslt-dev
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Again, with the Callee Map selected for the first function call, we can see the entire map of our script.
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Downloading the color images of this book
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better