Mastering Python Scientific Computing: A complete guide for Python programmers to master scientific computing using Python APIs and tools
4/5
()
About this ebook
Hemant Kumar Mehta
Hemant Kumar Mehta is a distributed and scientific computing enthusiast. He has more than 13 years of experience of teaching, research, and software development. He received his BSc (in computer science) Hons., master of computer applications degree, and PhD in computer science from Devi Ahilya University, Indore, India in 1998, 2001, and 2011, respectively. He has experience of working in diverse international environments as a software developer in MNCs. He is a post-doctorate fellow at an international university of high reputation. Hemant has published more than 20 highly cited research papers in reputed national and international conferences and journals sponsored by ACM, IEEE, and Springer. He is the author of Getting Started with Oracle Public Cloud, Packt Publishing. He is also the coauthor of a book named Internet and Web Technology, published by Kaushal Prakashan Mandir, Indore. He earned his PhD in the field of cloud computing and big data. Hemant is a member of ACM (Special Interest Group on High-performance Computing Education: SIGHPC-Edu), senior member of IEEE (the computer society, STC on cloud computing, and the big data technical committee), and a senior member of IACSIT, IAENG, and MIR Labs.
Related to Mastering Python Scientific Computing
Related ebooks
Functional Python Programming Rating: 0 out of 5 stars0 ratingsBuilding Machine Learning Systems with Python Rating: 4 out of 5 stars4/5Expert Python Programming - Second Edition Rating: 2 out of 5 stars2/5Scientific Computing with Python 3 Rating: 0 out of 5 stars0 ratingsLearning NumPy Array Rating: 0 out of 5 stars0 ratingsMatplotlib for Python Developers Rating: 3 out of 5 stars3/5Python Data Science Essentials Rating: 0 out of 5 stars0 ratingsMastering matplotlib Rating: 0 out of 5 stars0 ratingsConceptual Programming with Python Rating: 4 out of 5 stars4/5Modular Programming with Python Rating: 0 out of 5 stars0 ratingsDesigning Machine Learning Systems with Python Rating: 0 out of 5 stars0 ratingsBayesian Analysis with Python Rating: 4 out of 5 stars4/5Mastering Python Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials - Second Edition Rating: 4 out of 5 stars4/5Python Data Analysis Rating: 4 out of 5 stars4/5Learning Data Mining with Python Rating: 0 out of 5 stars0 ratingsPython Machine Learning By Example Rating: 4 out of 5 stars4/5Scientific Computing with Scala Rating: 0 out of 5 stars0 ratingsRegression Analysis with Python Rating: 0 out of 5 stars0 ratingsDistributed Computing with Python Rating: 0 out of 5 stars0 ratingsLarge Scale Machine Learning with Python Rating: 2 out of 5 stars2/5Numpy Simply In Depth Rating: 5 out of 5 stars5/5Getting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsPython For Data Science Rating: 0 out of 5 stars0 ratingsPython Machine Learning Rating: 4 out of 5 stars4/5Practical Machine Learning Rating: 2 out of 5 stars2/5Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratings
Reviews for Mastering Python Scientific Computing
1 rating0 reviews
Book preview
Mastering Python Scientific Computing - Hemant Kumar Mehta
Table of Contents
Mastering Python Scientific Computing
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. The Landscape of Scientific Computing – and Why Python?
Definition of scientific computing
A simple flow of the scientific computation process
Examples from scientific/engineering domains
A strategy for solving complex problems
Approximation, errors, and associated concepts and terms
Error analysis
Conditioning, stability, and accuracy
Backward and forward error analysis
Is it okay to ignore these errors?
Computer arithmetic and floating-point numbers
The background of the Python programming language
The guiding principles of the Python language
Why Python for scientific computing?
Compact and readable code
Holistic language design
Free and open source
Language interoperability
Portable and extensible
Hierarchical module system
Graphical user interface packages
Data structures
Python's testing framework
Available libraries
The downsides of Python
Summary
2. A Deeper Dive into Scientific Workflows and the Ingredients of Scientific Computing Recipes
Mathematical components of scientific computations
A system of linear equations
A system of nonlinear equations
Optimization
Interpolation
Extrapolation
Numerical integration
Numerical differentiation
Differential equations
The initial value problem
The boundary value problem
Random number generator
Python scientific computing
Introduction to NumPy
The SciPy library
The SciPy Subpackage
Data analysis using pandas
A brief idea of interactive programming using IPython
IPython parallel computing
IPython Notebook
Symbolic computing using SymPy
The features of SymPy
Why SymPy?
The plotting library
Summary
3. Efficiently Fabricating and Managing Scientific Data
The basic concepts of data
Data storage software and toolkits
Files
Structured files
Unstructured files
Database
Possible operations on data
Scientific data format
Ready-to-use standard datasets
Data generation
Synthetic data generation (fabrication)
Using Python's built-in functions for random number generation
Bookkeeping functions
Functions for integer random number generation
Functions for sequences
Statistical-distribution-based functions
Nondeterministic random number generator
Designing and implementing random number generators based on statistical distributions
A program with simple logic to generate five-digit random numbers
A brief note about large-scale datasets
Summary
4. Scientific Computing APIs for Python
Numerical scientific computing in Python
The NumPy package
The ndarrays data structure
File handling
Some sample NumPy programs
The SciPy package
The optimization package
The interpolation package
Integration and differential equations in SciPy
The stats module
Clustering package and spatial algorithms in SciPy
Image processing in SciPy
Sample SciPy programs
Statistics using SciPy
Optimization in SciPy
Image processing using SciPy
Symbolic computations using SymPy
Computer Algebra System
Features of a general-purpose CAS
A brief idea of SymPy
Core capability
Polynomials
Calculus
Solving equations
Discrete math
Matrices
Geometry
Plotting
Physics
Statistics
Printing
SymPy modules
Simple exemplary programs
Basic symbol manipulation
Expression expansion in SymPy
Simplification of an expression or formula
Simple integrations
APIs and toolkits for data analysis and visualization
Data analysis and manipulation using pandas
Important data structures of pandas
Special features of pandas
Data visualization using matplotlib
Interactive computing in Python using IPython
Sample data analysis and visualization programs
Summary
5. Performing Numerical Computing
The NumPy fundamental objects
The ndarray object
The attributes of an array
Basic operations on arrays
Special operations on arrays (shape change and conversion)
Classes associated with arrays
The matrix sub class
Masked array
The structured/recor array
The universal function object
Attributes
Methods
Various available ufunc
The NumPy mathematical modules
Introduction to SciPy
Mathematical functions in SciPy
Advanced modules/packages
Integration
Signal processing (scipy.signal)
Fourier transforms (scipy.fftpack)
Spatial data structures and algorithms (scipy.spatial)
Optimization (scipy.optimize)
Interpolation (scipy.interpolate)
Linear algebra (scipy.linalg)
Sparse eigenvalue problems with ARPACK
Statistics (scipy.stats)
Multidimensional image processing (scipy.ndimage)
Clustering
Curve fitting
File I/O (scipy.io)
Summary
6. Applying Python for Symbolic Computing
Symbols, expressions, and basic arithmetic
Equation solving
Functions for rational numbers, exponentials, and logarithms
Polynomials
Trigonometry and complex numbers
Linear algebra
Calculus
Vectors
The physics module
Hydrogen wave functions
Matrices and Pauli algebra
The quantum harmonic oscillator in 1-D and 3-D
Second quantization
High-energy Physics
Mechanics
Pretty printing
LaTeX Printing
The cryptography module
Parsing input
The logic module
The geometry module
Symbolic integrals
Polynomial manipulation
Sets
The simplify and collect operations
Summary
7. Data Analysis and Visualization
Matplotlib
The architecture of matplotlib
The scripting layer (pyplot)
The artist layer
The backend layer
Graphics with matplotlib
Output generation
The pandas library
Series
DataFrame
Panel
The common functionality among the data structures
Time series and date functions
Handling missing data
I/O operations
Working on CSV files
Ready-to-eat datasets
The pandas plotting
IPython
The IPython console and system shell
The operating system interface
Nonblocking plotting
Debugging
IPython Notebook
Summary
8. Parallel and Large-scale Scientific Computing
Parallel computing using IPython
The architecture of IPython parallel computing
The components of parallel computing
The IPython engine
The IPython controller
IPython view and interfaces
The IPython client
Example of performing parallel computing
A parallel decorator
IPython's magic functions
Activating specific views
Engines and QtConsole
Advanced features of IPython
Fault-tolerant execution
Dynamic load balancing
Pushing and pulling objects between clients and engines
Database support for storing the requests and results
Using MPI in IPython
Managing dependencies among tasks
Functional dependency
Decorators for functional dependency
Graph dependency
Impossible dependencies
The DAG dependency and the NetworkX library
Using IPython on an Amazon EC2 cluster with StarCluster
A note on security of IPython
Well-known parallel programming styles
Issues in parallel programming
Parallel programming
Concurrent programming
Distributed programming
Multiprocessing in Python
Multithreading in Python
Hadoop-based MapReduce in Python
Spark in Python
Summary
9. Revisiting Real-life Case Studies
Scientific computing applications developed in Python
The one Laptop per Child project used Python for their user interface
ExpEYES – eyes for science
A weather prediction application in Python
An aircraft conceptual designing tool and API in Python
OpenQuake Engine
SMS Siemag AG application for energy efficiency
Automated code generator for analysis of High-energy Physics data
Python for computational chemistry applications
Python for developing a Blind Audio Tactile Mapping System
TAPTools for air traffic control
Energy-efficient lights with an embedded system
Scientific computing libraries developed in Python
A maritime designing API by Tribon
Molecular Modeling Toolkit
Standard Python packages
Summary
10. Best Practices for Scientific Computing
The best practices for designing
The implementation of best practices
The best practices for data management and application deployment
The best practices to achieving high performance
The best practices for data privacy and security
Testing and maintenance best practices
General Python best practices
Summary
Index
Mastering Python Scientific Computing
Mastering Python Scientific Computing
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: September 2015
Production reference: 1180915
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-882-3
www.packtpub.com
Credits
Author
Hemant Kumar Mehta
Reviewers
Austen Groener
Sachin R. Joglekar
Commissioning Editor
Kartikey Pandey
Acquisition Editor
Kevin Colaco
Content Development Editor
Arshiya Umer
Technical Editor
Mohita Vyas
Copy Editor
Vikrant Phadke
Project Coordinator
Sanjeet Rao
Proofreader
Safis Editing
Indexer
Tejal Soni
Graphics
Jason Monteiro
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
About the Author
Hemant Kumar Mehta is a distributed and scientific computing enthusiast. He has more than 13 years of experience of teaching, research, and software development. He received his BSc (in computer science) Hons., master of computer applications degree, and PhD in computer science from Devi Ahilya University, Indore, India in 1998, 2001, and 2011, respectively. He has experience of working in diverse international environments as a software developer in MNCs. He is a post-doctorate fellow at an international university of high reputation.
Hemant has published more than 20 highly cited research papers in reputed national and international conferences and journals sponsored by ACM, IEEE, and Springer. He is the author of Getting Started with Oracle Public Cloud, Packt Publishing. He is also the coauthor of a book named Internet and Web Technology, published by Kaushal Prakashan Mandir, Indore.
He earned his PhD in the field of cloud computing and big data. Hemant is a member of ACM (Special Interest Group on High-performance Computing Education: SIGHPC-Edu), senior member of IEEE (the computer society, STC on cloud computing, and the big data technical committee), and a senior member of IACSIT, IAENG, and MIR Labs.
I am extremely thankful to my PhD supervisors, namely Professor Priyesh Kanungo and the late Professor Manohar Chandwani from Devi Ahilya University. Their words work as continuous guiding lights in my career and life.
I express heartfelt thanks to my dear student and friend, Pawan Pawar, for helping me develop some programs for this book.
I am also thankful to the entire Packt Publishing team and the reviewers for their tremendous support in maintaining the highest quality of work in this book.
Most of all, I thank my family. I am infinitely grateful to my parents. I thank my wife, Priya, and darling sons, Luv and Darsh, for whom this acknowledgement cannot be covered in words.
About the Reviewers
Austen Groener was raised in Southfield, Massachusetts, USA. He completed his BA in physics from Hartwick College and went on to pursue his MS and PhD in physics from Drexel University in Philadelphia, Pennsylvania, USA. He is a reputed astrophysicist, with research interests surrounding the detailed distribution of dark matter within the largest objects in the universe—galaxy clusters. When he is not studying the cosmos, he enjoys spending his free time developing software tools for other astronomers to use. Austen has a newfound interest in web development.
I would like to thank my family and friends for their unwavering support. To my wife, Brittany: you are the love of my life, my best friend, and my inspiration.
Sachin R. Joglekar is a computer science graduate from BITS-Pilani (Goa campus) in India. His areas of interest primarily include machine learning and intelligent systems. He graduated in December 2014. Since then, he has been working as the cofounder of a start-up based in Mumbai. His work involves the design and development of server infrastructure and backend analytics for sensor networks. Sachin has also worked as an open source developer for SymPy, a symbolic computing library written in pure Python. His work at Google Summer of Code 2014 involved developing the vector module for SymPy.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
Support files, eBooks, discount offers, and morehttps://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
To my parents and my gurus, Late Prof. Manohar Chandwani and Prof. Priyesh Kanungo
Preface
This book covers the Python APIs and toolkits used to perform scientific computing. It is highly recommended for readers who perform computerized engineering or scientific computations. Scientific computing is an interdisciplinary branch that requires a background in computer science, mathematics, general science (at least any one branch out of physics, chemistry, environmental science, biology, and others), and engineering. Python consists of a large number of packages, APIs, and toolkits for supporting the functionalities required by these diverse scientific and engineering domains.
A large community of users, lots of help and documentation, a large collection of scientific libraries and environments, great performance, and good support make Python a great choice for scientific computing.
What this book covers
Chapter 1, The Landscape of Scientific Computing – and Why Python?, introduces the basic concepts of scientific computing. It also discusses the background of Python, its guiding principle, and why using Python for scientific computing is efficient.
Chapter 2, A Deeper Dive into Scientific Workflows and the Ingredients of Scientific Computing Recipes, discusses the various concepts of mathematical and numerical analysis that are generally required to solve scientific problems. It also covers a brief introduction to the packages, toolkits, and APIs meant for performing scientific computing in the Python language.
Chapter 3, Efficiently Fabricating and Managing Scientific Data, discusses all the aspects about the underlying data of scientific applications, including the basic concepts, various operations, and the formats and software used to store data. It also presents standard datasets and techniques of preparing synthetic data.
Chapter 4, Scientific Computing APIs for Python, covers the basic concepts, features, and selected sample programs of various scientific computing APIs and toolkits, including NumPy, SciPy, and SymPy. A basic introduction to interactive computing, data analysis, and data visualization is also discussed in this chapter using IPython, matplotlib, and pandas.
Chapter 5, Performing Numerical Computing, discusses how to perform numerical computations using the NumPy and SciPy packages of Python. This chapter starts with the basics of numerical computation and covers a number of advanced concepts, such as optimization, interpolation, Fourier transformation, signal processing, linear algebra, statistics, spatial algorithms, image processing, file input/output, and others.
Chapter 6, Applying Python for Symbolic Computing, starts with the fundamentals of the Computerized Algebra System (CAS) and performing symbolic computations using SymPy. It covers a vast range of topics on CAS, from using simple expressions and basic arithmetic to advanced concepts of mathematics and physics.
Chapter 7, Data Analysis and Visualization, presents the concepts and applications of matplotlib and pandas for data analysis and visualization.
Chapter 8, Parallel and Large-scale Scientific Computing, discusses the concepts of high-performance scientific computing using IPython (which is done using MPI), the management of the Amazon EC2 cluster using StarCluster, multiprocessing, multithreading, Hadoop, and Spark.
Chapter 9, Revisiting Real-life Case Studies, illustrates several case studies of scientific computing applications, libraries, and tools developed using the Python language. Some cases studied from various engineering and science domains are presented in this chapter.
Chapter 10, Best Practices for Scientific Computing, discusses the best practices for scientific computing. It consists of the best practices for designing, coding, data management, application deployment, high-performance computing, security, data privacy, maintenance, and support. We also cover the best practices for general Python-based development.
What you need for this book
The example programs given in this book require a computer with Python 2.7.9 or a higher version, and several Python APIs/packages/toolkits. You will also require some Python libraries (namely NumPy, SciPy, SymPy, matplotlib, pandas, IPython), the IPython.parallel package, pyzmq, SSH for security (if necessary), and Hadoop.
Who this book is for
The book is intended for Python programmers willing to get hands-on exposure to scientific computing. The book expects that you have had exposure to various concepts of Python programming.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: The functions of the random module are bound methods of a hidden instance of the random.Random class.
A block of code is set as follows:
import random
print random.random()
print random.uniform(1,9)
print random.randrange(20)
print random.randrange(0, 99, 3)
print random.choice('ABCDEFGHIJKLMNOPQRSTUVWXYZ') # Output 'P'
items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
random.shuffle(items)
print items
print random.sample([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 5)
weighted_choices = [('Three', 3), ('Two', 2), ('One', 1), ('Four', 4)]
population = [val for val, cnt in weighted_choices for i in range(cnt)]
print random.choice(population)
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Downloading the color images of this book
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/8823OS.pdf.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
Chapter 1. The Landscape of Scientific Computing – and Why Python?
Using computerized mathematical modeling and numerical analysis techniques to analyze and solve problems in the science and engineering domains is called scientific computing. Scientific problems include problems from various branches of science, such as earth science, space science, social science, life science, physical science, and formal science. These branches cover almost all the science domains that exist, from traditional science to modern engineering science, such as computer science. Engineering problems include problems from civil and electrical to (the latest) biomedical engineering.
In this chapter, we will cover the following topics:
Fundamentals of scientific computing
The flow of the scientific computation process
Examples from scientific and engineering domains
The strategy to solve complex problems
Approximation, errors, and related terms
Concepts of error analysis
Computer arithmetic and floating-point numbers
A background of Python
Why choose Python for scientific computing?
Mathematical modeling refers to