0% found this document useful (0 votes)
21 views56 pages

Stepanov A. Data Science in Python Vol 2. Data I-O, Jupyter Notebook, GUI,..2016

This document is a guide on using Python for data science, covering topics such as data I/O, GUI programming with Tkinter, and high-performance computing with Numpy. It provides practical examples of reading, manipulating, and sorting data, particularly from files, and emphasizes the importance of handling corrupt data through exception handling. The book is aimed at beginners looking to quickly utilize Python and its libraries for data analysis.

Uploaded by

Young Mochi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views56 pages

Stepanov A. Data Science in Python Vol 2. Data I-O, Jupyter Notebook, GUI,..2016

This document is a guide on using Python for data science, covering topics such as data I/O, GUI programming with Tkinter, and high-performance computing with Numpy. It provides practical examples of reading, manipulating, and sorting data, particularly from files, and emphasizes the importance of handling corrupt data through exception handling. The book is aimed at beginners looking to quickly utilize Python and its libraries for data analysis.

Uploaded by

Young Mochi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

读累了记得休息一会哦~

网站:https://elib.cc
百万电子书免费下载
Data Science in Python
Volume 2.

Data I/O, GUI, Jupyter notebook, Deployment,


Numeric programming, High performance Python

Alexander Stepanov

Data I/O, GUI, Jupyter notebook, Deployment,


Numeric, High performance Python
Introduction
Reading data in your script
Reading data from file
Dealing with corrupt data
Manipulating data
Sorting data
Filtering data
Writing data to a file
CSV files
XLSX files
Using Jupyter notebook for user interaction
Display tabular data in IPython notebook
Adding user interaction
GUI programming with TkInter.
Tkinter application
tkinter variables
Button
Slider
Entry and Text widgets
Combobox
Menu
File open and file save dialogs
Diet calculator using Tk
Deployment
High performance computing
Numeric computations with Numpy
Numba - Just In Time Python compiler
Troubleshooting numba functions
Process level parallelism

Introduction
Python is the most popular programming language in scientific computing
today. It is simple, clear, and powerful. It works on Windows, Mac, Linux,
and various other platforms. An excellent introduction to Python can be
found in Python’s online help. In the real world data analysis, Python serves
as a glue for many mature extension libraries that have become the de-facto
standard.
This book is for people who want to start using Python and its popular
extension libraries in their work quickly. The best way to start is to install a
scientific python distribution, such as Anaconda - available for Windows,
Mac, and Linux or Winpython - available on Windows, that supply many
necessary extension libraries. The installation process is described in the
introductory volume 1 of this series. You might also want to get volume 3
that describes plotting library Matplotlib and using Python together with
SQLite database. I assume that you have a scientific Python bundle
installed on your machine and know how to start the Jupyter notebook we
are going to use for most examples.

Reading data in your script


Reading data from file
Let’s make our data file using Microsoft Excel, LibreOffice Calc, or some
other spreadsheet application and save it in a tab delimited file
ingredients.txt

Food carb fat protein calories serving


size
pasta 39 1 7 210 56
parmesan 0 1.5 2 20 5
grated
Sour cream 1 5 1 60 30
Chicken 0 3 22 120 112
breast
Potato 28 0 3 110 148
Fire up your IPython notebook server. Using the New drop down menu in
the top right corner, create a new Python3 notebook and type the following
Python program into a code cell:

#open file ingredients.txt


with open ( 'ingredients.txt' , 'rt' ) as f:
for line in f: #read lines until the end of file
print ( line ) #print each line

Remember that indent is important in Python programs and designates


nested operators. Run the program using the menu option Cell/Run , the
right arrow button, or the Shift-Enter keyboard shortcut. You can have
many code cells in your IPython notebooks, but only the currently selected
cell is run. Variables generated by previously run cells are accessible, but, if
you just downloaded a notebook, you need to run all the cells that initialize
variables used in current cell. You can run all the code cells in the notebook
by using the menu option Cell/Run All or Cell/Run All Above
This program will open a file called "ingredients" and print it line by line.
Operato r wit h is a context manager - it opens the file and makes it known
to the nested operators a s f . Here, it is used as an idiom to ensure that the
file is closed automatically after we are done reading it. Indentation befor e
fo r is required - it shows tha t for is nested i n wit h and has an access to
the variabl e f designating the file. Functio n prin t is nested insid e fo r
which means it will be executed for every line read from the file until the
end of the file is reached and th e fo r cycle quits. It takes just 3 lines of
Python code to iterate over a file of any length.
Now, let’s extract fields from every line. To do this, we will need to use a
string's metho d split( ) that splits a line and returns a list of substrings. By
default, it splits the line at every white space character, but our data is
delimited by the tab character - so we will use tab to split the fields. The
tab character is designate d \ t in Python.

with open ( 'ingredients.txt' , 'rt' ) as f:


for line in f:
fields=line. split ( ' \ t ' ) #split line in separate fields
print(fields) #print the fields

The output of this code is:

['food', 'carb', 'fat', 'protein', 'calories', 'serving size\n']


['pasta', '39', '1', '7', '210', '56\n']
['parmesan grated', '0', '1.5', '2', '20', '5\n']
['Sour cream', '1', '5', '1', '60', '30\n']
['Chicken breast', '0', '3', '22', '120', '112\n']
['Potato', '28', '0', '3', '110', '148\n']
Now, each string is split conveniently into lists of fields. The last field
contains a pesk y \ n character designating the end of line. We will remove
it using th e strip( ) method that strips white space characters from both
ends of a string.
After splitting the string into a list of fields, we can access each field using
an indexing operation. For example , fields[0 ] will give us the first field in
which a food’s name is found. In Python, the first element of a list or an
array has an index 0.
This data is not directly usable yet. All the fields, including those
containing numbers, are represented by strings of characters. This is
indicated by single quotes surrounding the numbers. We want food names
to be strings, but the amounts of nutrients, calories, and serving sizes must
be numbers so we could sort them and do calculations with them. Another
problem is that the first line holds column names. We need to treat it
differently.
One way to do it is to use file object's metho d readline( ) to read the first
line before entering th e fo r loop. Another method is to use functio n
enumerate ( ) which will return not only a line, but also its number starting
with zero:

with open ( 'ingredients.txt' , 'rt' ) as f:


#get line number and a line itself
#in i and line respectively
for i,line in enumerate ( f ) :
fields=line. strip () . split ( ' \ t ' ) #split line into fields
print ( i,fields ) #print line number and the fields

This program produces following output:

0 ['food', 'carb', 'fat', 'protein', 'calories', 'serving size']


1 ['pasta', '39', '1', '7', '210', '56']
2 ['parmesan grated', '0', '1.5', '2', '20', '5']
3 ['Sour cream', '1', '5', '1', '60', '30']
4 ['Chicken breast', '0', '3', '22', '120', '112']
5 ['Potato', '28', '0', '3', '110', '148']

Now we know the number of a current line and can treat the first line
differently from all the others. Let’s use this knowledge to convert our data
from strings to numbers. To do this, Python has functio n float ( ) . We
have to convert more than one field so we will use a powerful Python
feature called list comprehension .

with open ( 'ingredients.txt' , 'rt' ) as f:


for i,line in enumerate ( f ) :
fields=line. strip () . split ( ' \ t ' )
if i== 0 : # if it is the first line
print ( i,fields ) # treat it as a header
continue # go to the next line
food=fields [ 0 ] # keep food name in food
#convert numeric fields no numbers
numbers= [ float ( n ) for n in fields [ 1 : ]]
#print line numbers, food name and nutritional values
print ( i,food,numbers )

Operato r i f tests if the condition is true. To check for equality, you need
to us e = = . The index is only 0 for the first line, and it is treated
differently. We split it into fields, print, and skip the rest of the cycle using
th e continu e operator.
Lines describing foods are treated differently. After splitting the line into
fields , fields[0 ] receives the food's name. We keep it in the variabl e foo
d . All other fields contain numbers and must be converted.
In Python, we can easily get a subset of a list by using a slicing mechanism.
For instance , list1[x:y ] means that a list of every element in list1 -starting
with inde x x and ending with y- 1 . (You can also include stride, see help).
I f x is omitted, the slice will contain elements from the beginning of the
list up to the elemen t y- 1 . I f y is omitted, the slice goes from elemen
t x to the end of the list. Expressio n fields[1: ] means every field except
the firs t fields[0 ] .

numbers= [ float ( n ) for n in fields [ 1 : ]]

means we create a new lis t number s by iterating from the second element
in th e field s and converting them to floating point numbers.
Finally, we want to reassemble the food's name with its nutritional values
already converted to numbers. To do this, we can create a list containing a
single element - food's name - and add a list containing nutrition data. In
Python, adding lists concatenates them.

[ food ] + numbers

Dealing with corrupt data


Sometimes, just one line in a huge file is formatted incorrectly. For
instance, it might contain a string that could not be converted to a number.
Unless handled properly, such situations will force a program to crash. In
order to handle such situations, we must use Python's exception handling.
Parts of a program that might fail should be embedded into a try ... excep
t block. In our program, one such error prone part is the conversion of
strings into numbers.

numbers= [ float ( n ) for n in fields [ 1 : ]]

Lets insulate this line:

with open ( 'ingredients.txt' , 'rt' ) as f:


for i,line in enumerate ( f ) :
fields=line. strip () . split ( ' \ t ' )
if i== 0 :
print ( i,fields )
continue
food=fields [ 0 ]
try : # Watch out for errors!
numbers= [ float ( n ) for n in fields [ 1 : ]]
except : # if there is an error
print ( i,line ) # print offenfing lile and its number
print ( i,fields ) # print how it was split
continue # go to the next line without crashin
print ( i,food,numbers )

Manipulating data
Sorting data
In order to do something meaningful with the data, we need a container to
hold it. Let’s store information for each food in a list, and create a list of
these lists to represent all the foods. Having all the data conveniently in one
list allows us to sort it easily.

data= [] # create an empty list to hold data


with open ( 'ingredients.txt' , 'rt' ) as f:
for i,line in enumerate ( f ) :
fields=line. strip () . split ( ' \ t ' )
if i== 0 :
header=fields #remember a header
continue
food=fields [ 0 ] . lower () #convert to lower case
try :
numbers= [ float ( n ) for n in fields [ 1 : ]]
except :
print ( i,line )
print ( i,fields )
continue
#append food info to data list
data. append ([ food ] +numbers )
# Sort list in place by food name
data. sort ( key= lambda a:a [ 3 ] /a [ 4 ] , reverse= True )
for food in data: #iterate over the sorted list of foods
print ( food ) #print info for each food

['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]


['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]

data=[ ] creates an empty list and th e append( ) method appends new


variables to the list . sort( ) method sorts lists in place. If the list contains
simple values (such as numbers or strings), they are sorted from small to
large or alphabetically by default. We have a list of complex data and it is
not obvious how to sort it. So, we pass a ke y parameter to th e sort( )
method. This parameter is a function that takes an element of the list and
returns a simple value that is used to order the elements in the list. In our
case, we used a simple nameless lambda function that took record for each
food and returned the first element, which is the food's name. So we ended
up with the list sorted alphabetically.
We could also sort the list by the second value, which represents the amount
of carbohydrates per serving. All we have to do is change the lambda
function that calculates the key:

data. sort ( key= lambda a:a [ 1 ])

This will return foods in different order:

['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]


['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]

Of course, sorting by amount of carbohydrates per serving doesn't make


much sense because serving sizes might be as different as 5 grams for
parmesan and 148 grams for potatoes. Perhaps, ordering foods by amount
of protein per calorie might make more sense; whereby, the value would be
reflecting the "healthiness" of the food. Once again, all we need to do is to
change the key function:

data. sort ( key= lambda a:a [ 3 ] /a [ 4 ])

The output is

['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]


['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]

We have the "unhealthiest" food on top. Perhaps, we want to start with the
healthiest one. To do this we need to provide another parameter for th e
sort( ) method – reverse.

data. sort ( key= lambda a:a [ 3 ] /a [ 4 ] , reverse= True )

This will reverse the list.

['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]


['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]
['pasta', 39.0, 1.0, 7.0, 210.0, 56.0]
['potato', 28.0, 0.0, 3.0, 110.0, 148.0]
['sour cream', 1.0, 5.0, 1.0, 60.0, 30.0]
Although it is easy to sort by one or several columns in traditional
spreadsheet applications, it is much harder to sort by complex expressions
that require calculations on values from several columns. Python allows you
to easily do it.

Filtering data
Having our data in a list allows us to filter it with one line of code using list
comprehension, but, this time, we will use new a option for list
comprehension - a n i f that allows us to exclude some elements from the
new list:

data_filtered= [ a for a in data if a [ 3 ] /a [ 4 ]> 0.09 ]


for food in data_filtered:
print ( food )

The filtered list is:

['chicken breast', 0.0, 3.0, 22.0, 120.0, 112.0]


['parmesan grated', 0.0, 1.5, 2.0, 20.0, 5.0]

Writing data to a file


CSV files
After sorting and filtering data, we might want to save it back into a file to
share with the rest of the world. For this we will need to open a file for
writing. It is done with the sam e open( ) function we used to open a file
for reading, but pas s 'wt ' as a second parameter. It opens a file for writing
in text mode.

open ( 'ingredients_sorted.txt' , 'wt' )


Before writing numbers into a text file, we need to convert them into strings
with a str( ) function. List comprehension allows us to do it in one line.

data_fields= [ str ( a ) for a in data ]

The first field is the food's name and doesn't need to be converted, but th e
str( ) function is forgiving and just does nothing on this field while it
converts every other field to strings.
After converting all the fields, we need to join them into one line. We use
string' s join( ) method.

' \t ' . join ( data_fields )

This method, when applied to a string, takes a list of strings and joins it by
inserting the string it is applied to between them. We need to add an end of
line character to this string, and save it in a file using the file' s write( )
method.
The fragment of code that will write our sorted data in a file,
ingredients_sorted.txt, will look like this

with open ( 'ingredients_sorted.txt' , 'wt' ) as f:


#join header fields with tab,
#add an end of line character and write to a file
f. write ( ' \t ' . join ( header ) + ' \n ' )
for food in data:
# convert numbers back into strings
food_line= [ str ( a ) for a in food ]
# join fields with tab,
# add an end of line and write to a file
f. write ( ' \t ' . join ( food_line ) + ' \n ' )

XLSX files
If text CSV files are not enough, you can use XlsxWriter , Python extension
that allow you to write Microsoft Excel XLSX files and use the extra
features offered by this format. XlsxWriter is provided by Anaconda
distribution. If you use winpython and xlsxWriter is missing, you can
download it from the PyPI repository using the pip tool from winpython-
provided command prompt window

pip install xlsxwriter

Alternatively, you can download it from Christoph Gohlke's page


mentioned above, and install using the winpython control panel.
The following example will show how you can create an Excel file with
heatmap highliting of every row

import xlsxwriter
# Create a workbook and add a worksheet.
workbook = xlsxwriter. Workbook ( 'ingredients.xlsx' )
worksheet = workbook. add_worksheet ()
worksheet. write_row ( 0 , 0 ,header ) # write header
for i,line in enumerate ( data ) :
worksheet. write_row ( i +1 , 0 ,line ) # write data row
# Heatmap for major nutrient content
worksheet. conditional_format ( i +1 , 1 ,i +1 , 3 ,
{ 'type' : '2_color_scale' ,
'min_type' : 'num' ,
'max_type' : 'num' ,
'min_value' : min ( line [ 1 : 4 ]) ,
'max_value' : max ( line [ 1 : 4 ]) ,
'min_color' : "#0000FF" ,
'max_color' : "#FFFF00" })
workbook. close () # Save workbook

When opened in Excel or LibreOffice Calc, file will look like this
XlsxWriter allows the creation of workbooks with multiple worksheets, the
use of fancy formatting, and even including graphs in xlsx files. If you need
these features, see excellent online documentation at the XlsxWriter
website.
Using Jupyter notebook for user interaction
Display tabular data in IPython notebook
While working in an IPython notebook, in addition to plain text your
program can produce output in HTML format. This code snippet shows
how to format food data as an HTML table and display it in an IPython
workbook . HTM L class needs to be imported from IPython.display
module. Because the code for making the table might be reused, we will put
it in a separate function that accepts two arguments:

header - list of strings that are column headers


data - list of lists representing rows

Row elements might be either strings or numbers - we will convert them to


strings on the fly.

from IPython. display import HTML

def make_table ( header,data ) :


# Constructing a HTML table
html_string= '<table border="1">'
html_string+= '<tr style="background: #ccc;"><th>'
html_string+= '</th><th style="width: 7em; text-align: right;">' . join (
header ) + "</th></tr>" # make header row
for food in data:
food_list= [ str ( a ) for a in food ]
html_string+= "<tr><td>" + '</td><td style="width: 7em; text-align:
right;">' . join ( food_list ) + "</td></tr>" # make data row
html_string+= "</table>"
return HTML ( html_string ) # Create a html object from a string

t=make_table ( header,data ) # Create table from header and data


t # show in in IPython workbook
Generated table is shown in the notebook
serving
food carb fat protein calories
size
chicken breast 0.0 3.0 22.0 120.0 112.0
parmesan
0.0 1.5 2.0 20.0 5.0
grated
pasta 39.0 1.0 7.0 210.0 56.0
potato 28.0 0.0 3.0 110.0 148.0
sour cream 1.0 5.0 1.0 60.0 30.0

Adding user interaction


While exploring data, it is often useful to play with several parameters
interactively to find optimal values. IPython notebook allows such
interactivity with a simple user interface you can embed in notebooks. This
user interface is relatively limited - you can not create complex widget
layouts and some user interface functions, such as selecting and opening a
file, are not available for security reasons. We will explore creating real
graphical user interface (GUI) programs later. Now, let’s add some
interactive tools to our notebook.
We will create an IPython notebook that calculates the nutrient and energy
content of a dish using our ingredient table. The dish will contain chicken
breast, pasta, and parmesan. The user will adjust the amount of each
ingredient in grams using sliders. Notebook will calculate the amounts of
major nutrients and the energy value of entire dish - showing this
information in a table and displaying a pie chart that shows the relative
amount of major nutrients.
For this we will need to import some facilities from the library:

HTML to show an HTML formatted table


interact to create sliders that allow the user to adjust the amounts
of ingredients and
pyplot that allows to create graphs

We will also use IPython’s 'magic' command that tells it to embed graphs in
a notebook
%matplotlib inline
Without this command, plots will appear in a separate window. The
program header will look like this

from IPython. display import HTML


from ipywidgets import interact
import matplotlib. pyplot as plt
% matplotlib inline

To easily look up ingredient information given ingredient's name we will


store this data not in a list as we did before but in a dictionary. Unlike lists
dictionaries are unordered containers of key:value pairs. Python's strings
can serve as keys; so, we will use the ingredient names as keys and their
data as values. The part of the program dealing with reading food data from
the file will change slightly:

data= {} # create an empty list to hold data


with open ( 'ingredients.txt' , 'rt' ) as f:
for i,line in enumerate ( f ) :
fields=line. strip () . split ( ' \t ' )
if i== 0 :
header=fields # remember a header
continue
food=fields [ 0 ] . lower ()
try :
numbers= [ float ( n ) for n in fields [ 1 : ]]
except :
print ( i,line )
print ( i,fields )
continue
numbers= [ float ( n ) for n in fields [ 1 : ]]
data [ food ] =numbers # append food info to data list
print ( data )

Now, the food name becomes a key in the dictionary. It must be unique. If
we have two foods with identical names in our file, information will be
overwritten. The values stored are a list of the carbs, fat, energy value, and
protein per serving, and serving size.
Every time the user adjusts the amount of an ingredient, we will need to
calculate the amounts of carbs, fat, and protein supplied by each component
and calculate the total amounts of carb, fat, protein, and calories in a dish.
Lets write a function that will take the food name and the quantity of it in a
dish and calculate how many carbs, fats, proteins, and calories this
ingredient contributes to the dish:

def get_share ( food, quantity ) :


serving=data [ food ][ 4 ]
nutrients= [ round ( a * quantity/serving, 2 )
for a in data [ food ][ : 4 ]]
return nutrients

First, we look up the serving size for a given ingredient that is stored as the
last element of the food description. Then, we use list comprehension on the
per serving values of the carbs, fats, proteins, and calories stored in
positions 0 through 3 by dividing the per serving values by serving size,
multiplying them by the amount of the ingredient in a dish, rounding to two
decimal positions, and returning the result.
Next, we need to create a function that will be called every time user adjusts
the amount of an ingredient. This function will use all the values user can
adjust using interactive widgets as parameters. The default values of the
parameters allow IPython to guess what values are expected and what
widgets to use for the interface. We want to adjust the amounts of
ingredients in grams, that should be floating point numbers between 0 and
100. The names of the parameters will be used as labels for the widgets. So,
the function will look like this:
def show_results ( chicken= ( 0.0 , 100.0 ) ,
pasta= ( 0.0 , 100.0 ) ,
parmesan= ( 0.0 , 100.0 )) :
dish_content= [[ '<b>Total:</>' , 0 , 0 , 0 , 0 ]]
dish_content. append ([ 'chicken' ] +
get_share ( 'chicken breast' , chicken ))
dish_content. append ([ 'pasta' ] +
get_share ( 'pasta' , pasta ))
dish_content. append ([ 'parmesan' ] +
get_share ( 'parmesan grated' , parmesan ))
for column in range ( 1 , 5 ) : #Get sum for each column
column_sum= sum ([ a [ column ]
for a in dish_content [ 1 : ]])
dish_content [ 0 ][ column ] = round ( column_sum, 2 )
# draw a pie plot
plt. pie ( dish_content [ 0 ][ 1 : 4 ] ,
autopct= '%1.1f%%' ,
labels= ( 'carbs' , 'fat' , 'protein' ) ,
colors= ( 'palegreen' , 'gold' , 'salmon' ))
plt. show ()
return make_table ( header [ : -1 ] ,dish_content )

The function itself must draw a table. For this it uses the
make_tablefunction we created earlier . Make_tabl e needs a list of strings
used as column headers and a list of lists representing the rows of the table.
For a header, we can use the header we have read from a file, discarding the
last element "serving size".
Variabl e dish_conten t will contain table data as a list of rows.
Rows are lists containing the ingredient name in the first element followed
by the contribution of the given ingredient to the total number of carbs, fats,
proteins, and calories in the dish.
The first row shows the total content in a dish. So, we create a ro w [
"Total:" , 0 , 0 , 0 , 0 ] and calculate the total carbs, fats, proteins, and
calories later when the contributions of all the ingredients are known.
The contribution of each component is calculated by a functio n get_shar e
.
Finally, we go through the columns of the created table and fill the Total
values in the first row by summing up the values of each column. List
comprehension o n dish_conten t is used to get a list of values in each
column, and a standard librar y sum( ) function is used to sum them up.
The column sums are stored in the first row of the table.
The total values of the 3 major nutrients are then used to create a pie chart
for the dish content. The functio n pi e from module matplotlibtakes the
amount of carbs, fats, and proteins, and draws a pie chart using them.
Parameters, labels, and colors are self-explanatory and define the labels and
colors of the sectors, while the paramete r autopc t defines a formatting
string for the percentage of each ingredient in a dish. It means that the
percentage should be printed as a floating point number with at least one
digit before the decimal point and one after the decimal point, followed by
'%' . plt.show( ) to show the chart in the notebook.
Finally, the functio n show_result s calls a make_tabl e with a header and a
list of rows and returns the HTML element, which ensures the table will be
shown in the notebook.
To create a user interface and an event loop that will automatically call th e
show_result s function, we use the functio n interac t imported fro m
ipywidget s by giving it the name of the function to call when user changes
any controls:

interact ( show_results )

Here is the entire listing:

from IPython. display import HTML


from ipywidgets import interact
import matplotlib. pyplot as plt
% matplotlib inline

data= {} # create an empty list to hold data


with open ( 'ingredients.txt' , 'rt' ) as f:
for i,line in enumerate ( f ) :
fields=line. strip () . split ( ' \t ' )
if i== 0 :
header=fields # remember a header
continue
food=fields [ 0 ] . lower ()
try :
numbers= [ float ( n ) for n in fields [ 1 : ]]
except :
print ( i,line )
print ( i,fields )
continue
numbers= [ float ( n ) for n in fields [ 1 : ]]
data [ food ] =numbers # append food info to data list
print ( data )

def make_table ( header,data ) :


# Constructing a HTML table
html_string= '<table border="1">'
html_string+= '<tr style="background: #ccc;"><th>'

# make header row


html_string+= '''</th><th style="width: 7em;
text-align:right;">''' . join ( header ) + "</th></tr>"
for food in data:
food_list= [ str ( a ) for a in food ]

# make data row


html_string+= "<tr><td>"
html_string+= '''</td><td style="width: 7em;
text-align:right;">''' . join ( food_list )
html_string+= "</td></tr>"
html_string+= "</table>"

# Create a html object from a string


return HTML ( html_string )
def get_share ( food, quantity ) :
serving=data [ food ][ 4 ]
nutrients= [ round ( a * quantity/serving, 2 )
for a in data [ food ][ : 4 ]]
return nutrients

def show_results ( chicken= ( 0.0 , 100.0 ) ,


pasta= ( 0.0 , 100.0 ) ,
parmesan= ( 0.0 , 100.0 )) :
dish_content= [[ '<b>Total:</>' , 0 , 0 , 0 , 0 ]]
dish_content. append ([ 'chicken' ] +
get_share ( 'chicken breast' , chicken ))
dish_content. append ([ 'pasta' ] +
get_share ( 'pasta' , pasta ))
dish_content. append ([ 'parmesan' ] +
get_share ( 'parmesan grated' , parmesan ))
for column in range ( 1 , 5 ) :
column_sum= sum ([ a [ column ]
for a in dish_content [ 1 : ]])
dish_content [ 0 ][ column ] = round ( column_sum, 2 )
plt. pie ( dish_content [ 0 ][ 1 : 4 ] ,
autopct= '%1.1f%%' ,
labels= ( 'carbs' , 'fat' , 'protein' ) ,
colors= ( 'palegreen' , 'gold' , 'salmon' ))
plt. show ()
return make_table ( header [ : -1 ] ,dish_content )

interact ( show_results )
If you find calculations to be time consuming, you might consider another
function - interact_manua l . It adds a button to the interface and its target
function will only be called when the button is pressed. So you can adjust
several controls without waiting for unnecessary calculations to complete.
Interact also offers other kinds of widgets: check box, text entry box, drop
down list. The widget kind is guessed from the default arguments of a target
function. The following snippet will create an interface that consists of a
check box, text entry box, drop down list and a Run button you need to
push to call the target function that prints the parameters it receives.

from ipywidgets import interact_manual


def target ( a= True ,
b= 'abc' ,
c= [ 'one' , 'two' , 'three' ] ,
d= { 'apple' : 1 , 'pear' : 2 , 'orange' : 3 }) :
print ( a,b,c,d )
return
i=interact_manual ( target )

Alternatively, an interactive function might be used to create a control box


that serves as a parent of multiple widgets. These widgets might be
constructed by explicit constructor calls, which gives the user more control.
In addition, the state of those controls might be altered from within a
program. See IPython documentation for more information on available
controls.

GUI programming with TkInter.


Even if you just use Python for exploration, other people might want to use
your code. It might be your boss, your colleagues, your significant other or
the industry at large. It happens more often than you might think. You can
hack your code around, but other users need an intuitive graphical user
interface. You might want it too - a year or two down the road when you
have forgotten how the program works.
IPython provides basic widgets, but it is pretty limited in what it can do.
You can not position widgets precisely, and, if you have many of them, the
interface becomes a mess. Some features are absent for security reasons.
For instance, the dialogs to choose a file to open or save.
Fortunately, there are some fully fledged GUI toolkits available for Python.
They are TkInter - distributed as a part of the Python library and Qt -
proprietary toolkit with the free Python binding PySide . There is a binding
for a popular wxWidgets library for Python 2.x called wxPython . But even
now, 10 years after Python 3 was introduced, there are no production
quality wxWidgets bindings for Python 3. Project Phoenix is working on
this transition, and, eventually, binding might be available.
Finally, there are GUI toolkits like Enthought's TraitsUI that can use either
wxWidgets or Qt in the background - insulating you from problems with
either library. But, to be on the safe side let's focus on Python's standard
TkInter widget library. As an added bonus, using the standard library toolkit
is one dependency less when it is time to deploy your application.
Tkinter is based on pretty old technology - Tcl/Tk. Widgets don't look
native. Some widgets, such as a spreadsheet-like grid, that would be highly
relevant for our data analysis problems are conspiciously absent. There are
some solutions though. You can relatively easily implement a spreadsheet-
like grid yourself using TkInter's Canvas widget or use one of several add
on packages such as PandasTable . This widget uses the dataframe from a
popular data analysis library Pandas to hold data that is to be displayed in
the grid. The Pandas Dataframe is a Python implementation of similar
concept from the R programming language popular with statisticians. It is
very handy to display Dataframe to a user in a few lines of code and allow
him to manipulate it graphically. But, I am not going to describe either
Pandas or PandasTable in this book.

Tkinter application
The simplest Tkinter application can be written in just 4 lines of code like
this:

import tkinte r as tk
root = tk. Tk ()
root. mainloop ()
root. destroy ()

Actually , root.destroy( ) is only needed if you develop in IDLE or some


other development environments. So, the shortest possible Tkinter GUI
program is just 3 lines.
Of course, we only get an empty window. But, now we can fill it with
widgets. The placement of widgets is handled by layout managers. I find
the grid layout manager to be the most convenient for all of my needs. It
places widgets in cells of a two dimensional grid. After creating a widget,
you call its grid method supplying row, column, and alignment parameters,
and the widget gets placed in desired cell.
In addition to row, and column, you can supply rowspan and columnspan
parameters if the widget is large and should span several rows or columns.
Cells do stretch to accommodate larger widgets, but it might spread
neighboring smaller widgets too far apart. You can align widgets inside
cells using the sticky parameter that can be North, East, South, West, or any
combination of them. The tkinter module defines constants N,E,S,W to
represent widget alignment. To force the widget to occupy the entire cell,
you can pass sticky=N+E+S+W .
The simplest widget is a label. Let's create one and put it in a window.

import tkinte r as tk
root = tk. Tk ()
tk. Label ( root, text= "Label" ) . grid ( column= 0 , row= 0 )
root. mainloop ()
root. destroy ()

tkinter variables
If you want to change the displayed text at runtime, you need to link a
variable to your label. Tkinter provides a version of Python’s basic types
that can notify widgets or call callback functions when the value of the
variable changes. The types available are:
BooleanVar
DoubleVar
IntVar
StringVar
To connect a callback function that is called whenever a variable changes,
use the trace method. It takes two parameters: the mode and the name of
callback function. The mode can be:
"w" - callback is called whenever the variable is assigned new
value.
"r" - callback is called when the variable value is read by th e ge t
method.
"u" - callback is called when the variable is deleted.
You can set the value by calling the variable's set()method and get it by usin
g get( ) method.
The label can be connected to a variable by passing the variable's name as a
Label's constructo r textvariabl e parameter:

import tkinte r as tk
root = tk. Tk ()
text=tk. StringVar ( root, "Label 2" )
tk. Label ( root, textvariable=text ) . grid ( column= 0 , row= 0 )
root. mainloop ()

Now, if you change the variable, the text label on the screen will change
automatically to reflect the new value. Let’s change the variable in response
to a button click. The button constructor takes following parameters: the
parent window, the text that will appear on the button, and the function that
will be called when button is clicked.

Button
Let’s create a label connected to a StringVar, which will be initiated with
"Clicks: 0", and write a functio n increment( ) that will get this string, split
it, increment the number, construct a new string with an increased number,
and put it back into the StringVar variable to be displayed in a label.

import tkinte r as tk
root = tk. Tk ()
text=tk. StringVar ( root, "Clicks: 0" )

def increment () :
fields=text. get () . split ()
text. set ( fields [ 0 ] + ' ' + str ( int ( fields [ 1 ]) +1 ))

tk. Button ( root, text= "Push me" , command=increment ) . grid ( column=


0 ,row= 0 )
tk. Label ( root, textvariable=text ) . grid ( column= 1 , row= 0 )
root. mainloop ()
Slider
Several other widgets have command parameters and can call a function
every time the user alters them. One of them is slider. Slider's constructor in
tkinteris calle d Scal e . It takes the following parameters:
parent window
from_ lower bound (trailing underscore distinguishes it from the
reserved wor d fro m )
to higher bond
orient orientation. tkinter defines constants HORIZONTAL and
VERTICAL to set this parameter
command function to call when the user moves slider
There are many more parameters. For instance, resolution which you can
use to get step smaller or larger than the default 1. It might make the slider
return floating point values. For a complete set of parameters, see online
documentation here .
The current position of a slider can be queried using the metho d get( ) and
set using the metho d set( ) . Usually, if you want to use them, you need to
keep a variable returned by the constructor. You don't always have to
though. You rarely set the slider position programmatically and the slider
passes its current value to the callback function as a parameter.
Let’s make a slider that will go from 0 to 100 and control the label:

import tkinte r as tk
root = tk. Tk ()

def slider_response ( n ) :
fields=text. get () . split ()
text. set ( fields [ 0 ] + ' ' + str ( n ))

text=tk. StringVar ( root, "Clicks: 0" )


tk. Label ( root, textvariable=text ) . grid ( column= 0 , row= 0 )
my_slider=tk. Scale ( root, from_= 0 , to= 100 ,
orient=tk. HORIZONTAL , command=slider_response )
my_slider. grid ( column= 0 ,row= 1 )
root. mainloop ()

Entry and Text widgets


Entr y and T ex t widgets allow the user to enter text. Entryallows a single
line of text wherea s Tex t allows multiline text. Unlike slider, these
widgets don't hav e comman d parameters, but you can connec t StringVa
r to these widgets using th e textva r constructor parameter. Then, the
callback function can be attached to thi s StringVa r to call a function
when user types in the widget window.
Text entered by the user can be obtained by th e get( ) method of Entry or
the Text widget. To use this, you need to keep a reference to the widget
returned by the constructor.

Combobox
This widget draws a dropdown box that allows the user to select one of
several choices. We will use this widget to let the user choose an ingredient
for a dish.
Combobox is derived from Entry and resides in tkinter's ttk module that
holds themed Tk widgets. The Combobox constructor takes the
valuesparameter that supplies a list of choices. The current value can be
obtained by th e get( ) method and set by th e set( ) method. Of course,
you need to keep the reference returned by the constructor to call the
methods.
Unfortunately, there is n o comman d parameter to attach a callback that is
called when the user makes a new selection. But Comboboxinitiates an
even t <<ComboboxSelected> > that we can bind to a callback function.
Let’s write a program that will take the user's choice and show it in a label.

import tkinte r as tk
from tkinte r import ttk
root = tk. Tk ()
text=tk. StringVar ( root, "One" )
tk. Label ( root, textvariable=text ) . grid ( column= 0 , row= 0 )

def combobox_response ( n= None ) :


text. set ( my_combo. get ())

my_combo=ttk. Combobox ( root, values= [ "One" , "Two" , "Three" ])


my_combo. grid ( column= 0 ,row= 1 )
my_combo. set ( "One" )
my_combo. bind ( '<<ComboboxSelected>>' , combobox_response )
root. mainloop ()

Of course, you don't need a callback function here and might just link the
Label and Comboboxto the sam e StringVa r using th e textvariabl e
constructor parameter, but we will need the callback function later on. The
callback, when bound to an event, expects a parameter that is used for
instance, if this is a mouse event and we want to know the x and y of the
mouse cursor. In our case, we don't use the parameter, and just use a
dummy in the function definition to avoid error messages.

Menu
Tkinter's menus can be created with the Menu constructor that takes a
parent window as a parameter. Commands are added by the add_command
method that takes two parameters:
label - string shown to the user
command - name of a callback function

import tkinte r as tk
root = tk. Tk ()

menubar=tk. Menu ( root )


menubar. add_command ( label= "File" )
menubar. add_command ( label= "Edit" )
menubar. add_command ( label= "Help" )
root. config ( menu=menubar )
root. mainloop ()

Submenus are constructed as separate menus. The constructor should get a


parent menu as a parentparameter, and should be attached to the parent by
calling the parent' s add_cascad e method that takes parameters:
label
menu - name of a submenu
tkinter's dropdown menus have a tear line – a vestige not used in any
modern GUI style. To suppress it, we need a line root.option_add('*tearOff',
False)
Let’s write a program demonstrating the dropdown menu.

import tkinte r as tk
root = tk. Tk ()
root. option_add ( '*tearOff' , False )

menubar=tk. Menu ( root )

file_menu=tk. Menu ( menubar )


file_menu. add_command ( label= "Open" )
file_menu. add_command ( label= "Quit" )
menubar. add_cascade ( label= "File" ,menu=file_menu )
menubar. add_command ( label= "Edit" )
menubar. add_command ( label= "Help" )
root. config ( menu=menubar )
root. mainloop ()

File open and file save dialogs


Tkinter offers dialogs to choose the file for opening or saving. There is also
a dialog to choose a directory. These facilities are provided by tkinter's
filedialog module. The string containing the name of the user selected file
can be obtained with a single line of code:
from tkinte r import filedialog
file_name=filedialog. askopenfilename ()
You can also request a file name to save the file with:
file_name=filedialog. asksaveasfilename ()

There are also versions of these functions that open the file for you, name d
askopenfil e an d asksaveasfil e respectively. They take an extra parameter
mode"r" for read an d "w " for write, and return a file object.
All of these functions accept some useful optional parameters:
initialdi r – starts a dialog in a particular directory.
initialfil e – suggests an initial file name.
filetype s – allows the user to set a choice of file types and their
relative extensions.

file_types= [( 'any file' , '.*' ) ,


( 'text file' , '.txt' ) ,
( 'csv file' , '.csv' )]
file_name=filedialog. askopenfilename ( filetypes=file_types )

Unfortunately, this is not supported on OS/X


Another useful dialog provided by the filedialog module is askdirectory,
which allows the user to choose a directory.

Diet calculator using Tk


Now, with all prerequisites done, we can start building a realistic
application to do something useful. Let's give the diet calculator we have
developed in previous chapters a useful user interface.
It will let user choose ingredients from all the foods listed in ingredients
file, adjust amounts of all these ingredients, and calculate the total amounts
of major nutrients and the energy value of the dish.
The interface is rather monotonous. The three major nutrients and calories
are treated in the same way. The same is true for the ingredients of the dish.
So we will use for loops to lay out the interface instead of manually
creating and placing every widget. Whenever we want to keep a reference
to the widget to get its value, we will use lists. As a bonus, this design
allows us to easily scale our calculator from just 3 ingredients to one for
5,10 etc.
We lay out labels for the major nutrients. The label text is supplied in a
single line that is split into a list at the white space by the string' s split( )
method. Variabl e co l holding a column to place a label is incremented
automatically to position labels in separate columns

#Place nutrient labels


row= 0
col= 0
for label in "Fat Carbs Protein Calories" . split () :
tk. Label ( root, text=label ) . grid ( row=row,column=col,padx= 10 )
col+= 1

Then, we create variables that will hold the amounts of nutrients and
calories to be displayed to the user. This part has to be done manually one
variable at a time. Fortunately, we have only four variables to worry about

fat=tk. StringVar ( root, '0' )


carbs=tk. StringVar ( root, '0' )
protein=tk. StringVar ( root, '0' )
calories=tk. StringVar ( root, '0' )

These are string variables because the label widgets requir e StringVa r to
hold their text. All variables are initialized with "0"
The next step is to put a row of widgets below the previous one to display
the amounts of respective nutrients.
col= 0
row+= 1
for indicator in ( fat,carbs,protein,calories ) :
tk. Label ( root, textvariable=indicator ) . grid ( row=row,column=col )
col+= 1

The row variable is increased to start next row. The col variable is
reinitialized with 0 to start from the first column again. Variables to be
bound to the labels are supplied as a tuple to th e fo r operator.
Now, let’s give the user a way to select ingredients and adjust their
amounts. We don't want to hardcode the number of ingredients in the dish.
Instead, we put it in a variable n_ingredients. We want to keep the
references to dropdown lists that hold the ingredient selections and sliders
holding the ingredient amounts. So, we need two lists to hold the
references.

n_ingredients= 5
ingredient= []
amounts= []

Let’s place the ingredient controls now. The controls for each ingredient are
placed on a separate row. Labels for ingredients are generated automatically
by adding the ingredient's number to the end of the string "Ingredient " and
placed in column 0. The Combo box for ingredient selection and the slider
for adjusting the amounts are placed in columns 1 and 2 respectively. Each
new widget is added to the corresponding list and then accessed as the last
element of the list ingredient[-1] or amounts[-1] respectively. Combo boxes
are initiated with a list of a single line ["None"]. Both the combo bo x
<<ComboboxSelected> > event and the sliders are bound to a callback
functio n recalculate( ) that will poll the ingredient widgets and display the
nutrients and calories for the user. We will write it later.

row+= 1
col= 0
for i in range ( n_ingredients ) :
w=tk. Label ( root, text= "Ingredient " + str ( i +1 ))
w. grid ( row=row,column=col,padx= 10 ,sticky=tk. S )
ingredient. append ( ttk. Combobox ( root, values= [ "None" ]))
ingredient [ -1 ] . set ( 'None' )
ingredient [ -1 ] . bind ( '<<ComboboxSelected>>' , recalculate )
ingredient [ -1 ] . grid ( row=row,column=col +1 ,sticky=tk. S )
amounts. append ( tk. Scale ( root, from_= 0 , to= 150 ,
orient=tk. HORIZONTAL ,command=recalculate ))
amounts [ -1 ] . grid ( row=row,column=col +2 )
row+= 1

The function to load the ingredients file will be almost identical to the one
we wrote in an earlier chapter. Like before, we will keep the nutrition facts
for ingredients in a global dictionar y ingredient_dat a . There are two
differences though;
we will use the tkinter file open dialog to let the user choose an
ingredients file.
The ingredient names will be used to initialize dropdown boxes for
ingredient selection.
Actually, we will add one more string to the selection choices - "None"
representing the choice to not select an ingredient.

def load_ingredients () :
ingredient_data. clear ()
file_name=filedialog. askopenfilename ()
with open ( file_name, 'rt' ) as f:
for i,line in enumerate ( f ) :
fields=line. strip () . split ( ' \t ' )
if i== 0 :
header=fields
continue
food=fields [ 0 ] . lower ()
try :
numbers= [ float ( n ) for n in fields [ 1 : ]]
except :
print ( i,line )
print ( i,fields )
continue
numbers= [ float ( n ) for n in fields [ 1 : ]]
ingredient_data [ food ] =numbers
spinbox_list= [ 'None' ] + list ( ingredient_data. keys ())
for i in ingredient:
i. config ( values=spinbox_list )

This function will be called from the menu. Let’s construct the menu system
and connect the load_ingredients function to the File/Open menu choice. In
addition, let’s add File/Quit menu to enable the user to quit the application
from menu.

root. option_add ( '*tearOff' , False )


menubar = tk. Menu ( root )
file_menu=tk. Menu ( menubar )
file_menu. add_command ( label= 'Open ingredients' ,
command=load_ingredients )
file_menu. add_command ( label= 'Quit' ,command=root. quit )
menubar. add_cascade ( label= 'File' , menu=file_menu )
root. config ( menu=menubar )

The only missing piece now is a function to calculate the major nutrients
and energy value of the dish and update respective labels. The function will
be used as a callback for the ingredient choice combo box event and
ingredient amount sliders. Because we don't know if it is a combo box or a
slider that called a callback, we will ignore the passed parameter and poll
ingredient controls usin g get( ) methods.
the function will go through the list of ingredient selection drop boxes, and
look up the nutrition facts for each from th e ingredient_dat a dictionary.
We have one choic e "None " which is not in the dictionary though. For
convenience, it should return a 0 for nutrients and calories and 1 for serving
value. Because we will need a ratio of amount/serving value to calculate the
contribution of ingredients and don't want to have division by 0 exception.
We could test if the user's choice is in th e ingredient_dat a dictionary
explicitly, but the dictionary has a get( ) method that allows us to supply a
default value to return when a key is not in the dictionary. So, the look up of
an ingredient's nutrition facts will look like this:

ingredient_data. get ( food, [ 0 , 0 , 0 , 0 , 1 ])

if the food is in the dictionary, this line returns its nutrition facts. If not, a lis
t [0,0,0,0,1 ] is returned. Nutrition facts for all ingredients are collected in
a lis t contri b . The amounts of ingredients are collected in a list amount.
Then, we use a fo r loop to iterate trough a tuple of StringVars. The
enumerate function gives us an easy way to obtain both a variable to be
updated and its number in a tuple that we use to select major nutrient or
calories in ingredient info simultaneously.

def recalculate ( event= None ) :


contrib= [ ingredient_data. get ( i. get () , [ 0 , 0 , 0 , 0 , 1 ]) for i in
ingredient ]
amount= [ a. get () for a in amounts ]
for j,nutr in enumerate (( carbs,fat,protein,calories )) :
total= sum ([ contrib [ i ][ j ]* amount [ i ] /contrib [ i ][ 4 ]
for i in range ( n_ingredients )])
nutr. set ( str ( round ( total, 1 )))

Below is the code for the entire program. Although you can run it from
IPython notebook, I would recommend you save it in a .py file to run as a
standalone program. The program is cross platform. You can distribute it
easily as there are no outside dependencies. All you need to run it is a
standard Python installation. Your calorie counting friends might appreciate
this tool. Of course, you need to supply the ingredients.txt file too. The
format of this file is self explanatory and anyone can add more foods if they
wish to.

import tkinter as tk
from tkinter import ttk
from tkinter import filedialog

root = tk.Tk ()
n_ingredients= 5 # Number of ingredients in a dish
ingredient_data= {} # Dictionary to hold data nutrition facts

def recalculate ( event= None ) :


contrib= [ ingredient_data.get ( i.get () , [ 0 , 0 , 0 , 0 , 1 ]) for i in
ingredient ]
amount= [ a.get () for a in amounts ]
for j,nutr in enumerate (( carbs,fat,protein,calories )) :
total= sum ([ contrib [ i ][ j ]* amount [ i ] /contrib [ i ][ 4 ]
for i in range ( n_ingredients )])
nutr.set ( str ( round ( total, 1 )))

def load_ingredients () :
ingredient_data.clear ()
file_name=filedialog.askopenfilename ()
with open ( file_name, 'rt' ) as f:
for i,line in enumerate ( f ) :
fields=line.strip () .split ( ' \t ' )
if i== 0 :
header=fields
continue
food=fields [ 0 ] .lower ()
try :
numbers= [ float ( n ) for n in fields [ 1 : ]]
except :
print ( i,line )
print ( i,fields )
continue
numbers= [ float ( n ) for n in fields [ 1 : ]]
ingredient_data [ food ] =numbers
spinbox_list= [ 'None' ] + list ( ingredient_data.keys ())
for i in ingredient:
i.config ( values=spinbox_list )
#Make variables to hold nutrient amounts
fat=tk.StringVar ( root, '0' )
carbs=tk.StringVar ( root, '0' )
protein=tk.StringVar ( root, '0' )
calories=tk.StringVar ( root, '0' )

#Place nutrient labels


row= 0
col= 0
for label in "Fat Carbs Protein Calories" .split () :
tk.Label ( root, text=label ) .grid ( row=row,column=col,padx= 10 )
col+= 1

#Show nutrient amounts


col= 0
row+= 1
for indicator in ( fat,carbs,protein,calories ) :
tk.Label ( root, textvariable=indicator ) .grid ( row=row,column=col )
col+= 1

#Show ingredients tools


ingredient= []
amounts= []
row+= 1
col= 0
for i in range ( n_ingredients ) :
w=tk.Label ( root, text= "Ingredient " + str ( i +1 ))
w.grid ( row=row,column=col,padx= 10 ,sticky=tk.S )
ingredient.append ( ttk.Combobox ( root, values= [ "None" ]))
ingredient [ -1 ] .set ( 'None' )
ingredient [ -1 ] .bind ( '<<ComboboxSelected>>' , recalculate )
ingredient [ -1 ] .grid ( row=row,column=col +1 ,sticky=tk.S )
amounts.append ( tk.Scale ( root, from_= 0 , to= 150 ,
orient=tk.HORIZONTAL,command=recalculate ))
amounts [ -1 ] .grid ( row=row,column=col +2 )
row+= 1
root.option_add ( '*tearOff' , False )
menubar = tk.Menu ( root )
file_menu=tk.Menu ( menubar )
file_menu.add_command ( label= 'Open ingredients' ,
command=load_ingredients )
file_menu.add_command ( label= 'Quit' ,command=root.quit )
menubar.add_cascade ( label= 'File' , menu=file_menu )
root.config ( menu=menubar )

root.mainloop ()
root.destroy ()

Deployment
Deployment is a relatively weak part of Python world. There are many
excellent packages available. They are collected in nice distributions, but
your users have to download sometimes huge distributions to use your
program. Often, a program requires some additional packages that need to
be installed on top of the standard distribution. This is already a challenge
for a regular user. But should more than one Python distribution be present
on the system, the problem quickly gets out of control.
Fortunately, there are several packages that allow you to pack your
program, Python interpreter, and dependency libraries into a neat installer
or even one executable file. I'll show how to use pyinstaller, which allows
you to create installers for your program on all major platforms.
Even though pyistaller is cross platform, you still need to package your
program on the platform for which you want to make your installer. If you
want to generate an installer for Windows, make sure your program works
on a Windows machine and package it there. If you want an installer for
OS/X, make your program work on an OS/X box and package it on Mac.
Oh, and programs packaged on 64 bit systems will not run on a 32 bit
version of the same system.
If it is OS/X, Linux, or another UNIX-like platform, you may want to
package your program on the oldest available OS version. The distributive
is dynamically linked to the system's C library and can work with later
versions, but might fail on earlier ones. You don't need separate computers
for a OSs though. You can use virtualbox environment to run a guest OS on
the same physical computer.
It is very handy that Anaconda distribution can work on all platforms. You
can install it and the necessary packages for all systems, and just transfer
your code to package it on all systems. Unfortunately, there might still be
some problems with packaging installers from Anaconda. At the time of
this writing, pyinstaller is only available as an Anaconda add on package on
Linux, which might reflect problems on other platforms. But a few months
ago, OpenCV was also only available for the Linux version of Anaconda.
Now, it is available on Windows and OS/X too. So maybe pyinstaller will
be available for Anaconda on all platforms.
For now, on Windows and OS/X, you can install it prom the Python
package index PYPI. Open your terminal window and run:

pip install pyinstaller

If you have more than one Python installation on your computer, you might
type the complete path to Anaconda's bin directory to make sure pyinstaller
will be installed for the right distribution. On Windows, both Anaconda and
winpython provide a dedicated command prompt shortcut that works with
the given distribution. You can find it in start the menu for Anaconda or in
the Winpython folder respectively.
After pyinstaller is installed for your given Python distribution, all you have
to do is start a command from the command prompt window.

pyinstaller yourscript.py

where yourscript.py is the name of your main program module. if you have
a GUI program and want to suppress the console window on Windows or
OS/X use:
pyinstaller -w yourscript.py

Pyinstaller packages the distribution version of your program in a folder


dist. Subfolder with the same name as the packaged program contains an
executable with the program's name as well as all the dependencies,
including Python itself. To distribute the program, all you have to do is give
the user a copy of this folder. Pyinstaller can create a single file executable
and encrypt Python program's byte code for you if you wish.
The size of the distribution is pretty big. The diet calculator takes over
20MB of disk space. A Zip compressed version is a little over 9MB. This
shouldn't be an issue with modern hard drives and Internet connection speed
though. It is definitely much smaller than the entire Anaconda distribution,
which is close to 400MB when compressed, and far less confusing than
making the user install Python and possibly the add on packages your
program might require.

High performance computing


Numeric computations with Numpy
Python is slow. Untyped variables that can hold a single number, nested
lists, or dictionaries give Python its flexibility and expressive power, but are
a nightmare at execution time. Each variable requires a structure that
remembers variable's type. During one iteration of a cycle, the variable
might be a number, during next iteration it might be a string, and the next
time it might be a dictionary. So, each time a variable is used in an
expression, its type has to be checked and the appropriate action performed.
Large objects are not copied each time a variable is assigned; instead, a new
variable holds the address of the data structure holding the assigned value.
Each data structure must keep track of the variables that point to it to know
when information is no longer referred to by any variables and might be
safely discarded to free up memory.
All this requires a lot of time, but many number crunching tasks demanding
high performance don't require such flexibility. They are performed on large
multidimensional arrays of numbers of the same type. A simple example
might be image data represented by a three dimensional array of 8-bit,
unsigned integers representing the brightness of each image pixel in three
color channels.
A computer can operate on such structures very efficiently, often doing
operations on several neighboring numbers at once. The multiple processor
cores of modern multicore processors can operate on different parts of the
arrays simultaneously, further increasing performance. These arrays are
created and destroyed at once and all their elements are of the same type, so
type and reference count information should be stored only once for the
entire array instead of keeping it for each of the millions of elements.
A standard library for working with such arrays of numbers in python is
numpy . Numpy can create one dimensional arrays from python lists or
multidimensional arrays from nested lists of lists. It allows you to create
arrays of zeroes of known dimensions and type, arrays containing ranges of
floation point, or integer values.

import numpy as np

a= [ 0 , 1 , 2 , 3 ]

#make array of integers


a1=np. array ( a,dtype= 'i4' )
print ( 'a1 = ' ,a1 )

#make array of floating point numbers


a2=np. array ( a,dtype= 'f4' )
print ( 'a2 = ' ,a2 )

# print elementwise sum of arrays


print ( 'a1 + a2 = ' ,a1+a2 )

The result of running thin in IPython notebook is


a1 = [0 1 2 3]
a2 = [ 0. 1. 2. 3.]
a1 + a2 = [ 0. 2. 4. 6.]

Multiple dimensional arrays can be created in several ways such as;

b= [[ 1 , 2 , 3 ] , [ 4 , 5 , 6 ] , [ 7 , 8 , 9 ]]

#multidimensional array from nested lists


b1=np. array ( b )
print ( 'b1 = \n ' ,b1 )
print ()
b2=np. zeros (( 3 , 3 )) # 3X3 array of zeros
print ( 'b2 = \n ' ,b2 )

The result will be

b1 =
[[1 2 3]
[4 5 6]
[7 8 9]]

b2 =
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]

Two useful numpy array constructors ar e arang e an d linspac e . The


first works like python' s rang e function, in which start, end, and step
values can be specified. The second creates a range from a given start and,
end values, and number of steps. Both start and end values are included.
Linspace is useful to generate equally spaced x values that can be used to
calculate y values for drawing function graphs.
c1=np. arange ( 10 )
print ( 'c1 = ' ,c1 )
c2=np. linspace ( 0 , 10 , 15 )
print ( 'c2 = ' ,c2 )

This code will generate following output:

c1 = [0 1 2 3 4 5 6 7 8 9]
c2 = [ 0. 0.71428571 1.42857143 2.14285714 2.85714286
3.57142857 4.28571429 5. 5.71428571 6.42857143
7.14285714 7.85714286 8.57142857 9.28571429 10. ]

Besides array creation and arithmetic operations, numpy offers powerful


indexing and slicing mechanisms. Rows and columns can be obtained by
slicing and then used on either side of assignment:

b= [[ 1 , 2 , 3 ] , [ 4 , 5 , 6 ] , [ 7 , 8 , 9 ]]
b1=np. array ( b )
print ( 'b1 = \n ' ,b1 )
print ()
print ( 'row 2 = ' , b1 [ 1 ,: ])
print ( 'column 2 = ' , b1 [ :, 1 ])
print ()

# fill second row with values


# from the second column
b1 [ 1 ,: ] =b1 [ :, 1 ]
print ( 'b1 = \n ' ,b1 )

This code will produce output

b1 =
[[1 2 3]
[4 5 6]
[7 8 9]]

row 2 = [4 5 6]
column 2 = [2 5 8]

b1 =
[[1 2 3]
[2 5 8]
[7 8 9]]

Arrays can be tested for a condition. The result will be a boolean array of
the same dimensions with “ Tru e ” wherever an element satisfies the
condition and “ Fals e ” for all other elements. Boolean arrays, in turn, can
be used for indexing in order to get only the elements that satisfy
conditions.

c1=np. arange ( 10 )
print ( 'c1 = ' ,c1 )
print ( 'c1 < 5 = ' ,c1 < 5 )
print ( 'c1[c1<5] = ' ,c1 [ c1 < 5 ])
c1 [ c1 > 5 ] = 0 #can be used on either side of assignment
print ( 'c1 = ' ,c1 )

The output produced is:

c1 = [0 1 2 3 4 5 6 7 8 9]
c1 < 5 = [ True True True True True False False False False False]
c1[c1<5] = [0 1 2 3 4]
c1 = [0 1 2 3 4 5 0 0 0 0]

This just scratches the surface of numpy functionality. Numpy provides


many tools for array operations, mathematical functions, statistical
operations, linear algebra operations, etc. This functionality is further
enhanced by the scipy library. Both Numpy and scipy have excellent online
documentation . Plenty of online tutorials and books for numpy and scipy
are available as well. I highly recommend that you at least look through the
table of contents to have an idea of what is already available for you in
these packages.

Numba - Just In Time Python compiler


Although numpy is written in C or Fortran and standard routines working
on arrays of data are highly optimized, non-standard operations are still
coded in python and might be painfully slow. Fortunately, the Pydata
company developed a package that can translate python code into native
machine code on the fly and execute it at the same speed as C programs. In
some respects, this approach is even better than compiled code because the
resulting code is optimized for each particular machine and can take
advantage of all the features of the processor, whereas regular compiled
programs might ignore some processor features for the sake of
compatibility with older machines, or might have even been compiled
before new features were even developed.
Besides, your Python program, using the Numba just in time compiler will
work on any platform for which Python and Numba are available. The user
will not need to worry about C compiler. There will be no hassle with
dependencies or complex makefiles and scripts. Python code just works out
of the box - taking full advantage of all available hardware.
The LLVM virtual machine used by Numba allows compiled code to run on
different processor architectures, GPU, and accelerator boards. It is under
heavy development, so while I was writing this book execution times for
example programs were cut more than in half.
Such heavy development on both Numba and LLVM has some
disadvantages as well. Obviously, some Python features could never be
significantly accelerated. But some could and will be accelerated in future
versions of Numba. When I started working on this book, Numba’s
compiled functions could not handle lists or create numpy arrays. Now, they
can do it. Obviously, some material in this section will be obsolete well
before the rest of the book. But it is a good thing. Just keep an eye on
Pydata's Numba web site.
For some strange reason, numba was not included in the Anaconda Linux
installer. So, I had to install it manually by opening anaconda3/bin folder in
terminal and typing

conda install numba

The same should work on windows. Just use terminal shortcut from
Anacomda's folder in Windows start menu. Numba is usually included with
later versions of winpython. If not, download the wheel package and
dependence packages from Christopher Gohlke's page and install them
using winpython's setup utility.
To illustrate speedups you can get with numba , I'll implement the Sieve of
Eratosthenes prime number search algorithm. Because, in order to
accelerate a function, Numba needs to know the type of all the variables or
at least should be able to guess them, and this type should not change
during function. The execution numpy arrays are the data structures of
choice when working with numba .
Here is the Python code:

fro m numb a impor t jit


impor t numpy as np
impor t time

@ji t ( 'i4(i4[:])' )
de f get_prime s ( a ) :
m=a . shape [ 0 ]
n= 1
fo r i i n range ( 2 , m ) :
if a [ i ]! = 0 :
n+ = 1
fo r j i n range ( i ** 2 ,m, i ) :
a [ j ] = 0
retur n n
#create an array of integers 0 to a million
a=np . arange ( 1000000 0 , dtype=np . int32 )
start_time = time . time ( ) #get system time
n=get_prime s ( a ) #count prime numbers
#print number of prime numbers below a million
#and execution time
print ( n , tim e . time () -start_time )

First, we import numba, numpy, and the time module that will be used to
time the program execution. Then, we need a function implementing the
Sieve of Eratosthenes on numpy’s array of integers. A function’s definition
is preceded by the decorato r @ji t (Just In Time compile) imported from
the numba package. It tells numba to compile this function into machine
code. The rest of the program is executed as plain Python. Decorator tells
numba that function must return a four bite or 32 bit integer, and receives a
parameter that is one dimensional array of 4 byte integers.
Using numpy' s arang e function, we can create an array of consecutive
integer numbers between zero and a million, remember current time . Call
up a functio n get_prime s that counts the prime numbers in the array and
zeroes out non-prime numbers. As soon as the function returns, we get
current time again and print the number of found prime numbers as well as
time function was executing.
On my Sandy Bridge laptop, numba accelerated function takes about 7ms to
complete. If I comment out @jit decorator -

#@jit('i4(i4[:])')

The execution time increases to 3s. Compilation results in 428 fold


speedup. Not bad for one line of code. Searching for prime numbers
between 1 and 10 millions takes 146ms with numba and 42s in pure Python
respectively. This is also 287 fold speedup. These numbers are bound to
change as numba, llvm, and processors improve.
Because the function get_primes gets just a reference and nota copy of the
original array, non-prime numbers in the array are still zeroed out and we
can get prime numbers using the fancy indexing discussed in the numpy
section:
print ( a [ a > 0 ])

Default array printing behavior is not particularly useful here as it only


shows a few numbers at the beginning and the end of the array. You can
change this behavior or just iterate through a filtered array usin g fo r loop.

Troubleshooting numba functions


Although numba is under heavy development and is quickly becoming
more robust, it is still a tool for optimizing the most critical parts of code.
These parts should be refactored in small functions, debugged in plain
Python, and then decorated with numba' s @ji t decorator.
In the best case scenario, you will instantly see a performance boost. But,
sometimes you see no difference. It is likely that numba failed to compile
the function into machine codes and falls back on using Python's objects to
represent problematic variables. This slows execution down to almost a
pure Python level. Perhaps, in some cases, it is good that the function
doesn't fail completely, but it doesn't report problems either and you don't
know if you can tweak your code a little to get your two orders of
magnitude performance increase.
One way to force the compilation to machine code is by the givin g @ji t
decorator a paramete r nopython=Tru e . This will force numba to fail
compilation and show an error message if any variable could not be
compiled into the processor's native type. Another approach is to set the
environmental variabl e NUMBA_WARNING S before importing numba.
You can do this from within your python script by adding two lines on top
of it.

import os
o s . environ [ 'NUMBA_WARNINGS' ] = "1"

from numba import @jit


Finally, you can dump numba's intermediate representation of your function
by applying a metho d inspect_type s to your numba compiled functions.
If any variable has typ e pyobjec t instead of something lik e int3 2 o r
float6 4 , there might be a problem. As numba is getting smarter, the
impact of this problem outside of tight loops might diminish dramatically,
but, on the other hand, the problematic parts of code that can easily reduce
performance several fold become harder to spot.
Describing the types of function parameters, return value, and local
variables i n @ji t decorator might also significantly increase the
performance of your numba-compiled function. You might play with some
additional numba compilation parameters. For instance, the use of AVX
commands is disabled on Sandy bridge and Ivy bridge processors by
default, and you might want to try enabling it. This could be done by setting
an environment variabl e NUMBA_ENABLE_AV X . In case you are
curious to see the assembly code of your numba compiled function, you
may request numba to print it for you by setting the environmental variabl
e NUMBA_DUMP_ASSEMBL Y .

import os
o s . environ [ 'NUMBA_ENABLE_AVX' ] = "1"
o s . environ [ 'NUMBA_DUMP_ASSEMBLY' ] = "1"
from numba import @jit

See numba documentation for more details.

Process level parallelism


If you need an even higher performance, you can use process level
parallelism. Python objects are not designed for use in parallel programs;
so, Python employs Global Interpreter Lock (GIL) to block parallel
execution. Compiled modules, including numba compiled functions, can
use parallelism to take advantage of multicore processors or several
processors in a system. Autoparallelisation might even make use of
parallelism transparent for a programmer. But, for now, the use of multiple
cores is complicated.
Fortunately, time consuming computations can often be divided into
independent chunks. For instance, we will explore image analysis in the
next chapter, which takes a noticeable amount of time. If you have several
hundreds of images to analyze, program run might take minutes or even
hours. But, the analysis of different images can be carried out
independently; so, you can spawn several subprocesses - each running an
independent Python interpreter - and hand every subprocess its fair share of
images to analyse. You can start as many processes on as many cores as
your processor has, or, if you have a processor capable of multithreading, as
many threads as it simultaneously supports. Each process gets a list of file
names and returns results of the analysis to the parent process.
Python's standard library offers facilities to simplify this approach in a
module called multiprocessing . It even allows you to utilize other
computers over a network. Of course, you can still take advantage of
Numba’s just in time compilation. Actually, I suggest you try it first. The
200 fold speed up you can obtain with Numba might be all you need. It is
definitely worth trying before you buy 200 computers to make a cluster or
start hogging the resources of a cluster you have at work. Using Numba in a
cluster will probably require installing it on each computer.
Using the multiprocessing module is pretty simple.

import glob
import os
from multiprocessing import Pool

def f ( file_name ) : #Worker function


process_id = o s . getpid () # obtain process id
file_size = o s . path . getsize ( file_name ) #obtain file size
return [ process_id, file_size, file_name ]

if __name__ == '__main__' : #If executed by a parent process


with Pool ( 8 ) as p: #create a pool of 8 workers
#obtain names of files in working directory
#using glob function from module glob
files = glo b . glob ( '*.*' )
result = p. map ( f, files ) #run analysys in parallel
for r in result: #print the results
print ( ' \t ' . join ([ str ( s ) for s in r ]))

It is important to use the

if __name__ == '__main__' :
block to spawn child processes. Copies of Python interpreter run in child
processes open the same module to import a worker function. This
conditional statement prevents them from spawning new processes
recursively.
Th e Poo l object has several methods that allow it to run workers
asynchronously, set a time limit for the completion of the parallel task, and
control if the results will be returned in the same order as the arguments or
in an arbitrary order etc. I refer you to the multiprocessing module
documentation for further detail.

You might also like