0% found this document useful (0 votes)
105 views12 pages

Chap05 - Jupyter Notebook

Uploaded by

Aquiles Alonzo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views12 pages

Chap05 - Jupyter Notebook

Uploaded by

Aquiles Alonzo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

7/4/2020 chap05soln - Jupyter Notebook

Modeling and Simulation in Python


Chapter 5

Copyright 2017 Allen Downey

License: Creative Commons Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0)

In [1]:

# Configure Jupyter so figures appear in the notebook


%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment


%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

# import functions from the modsim.py module


from modsim import *

Reading data
Pandas is a library that provides tools for reading and processing data. read_html reads a web page from a
file or the Internet and creates one DataFrame for each table on the page.

In [2]:

from pandas import read_html

The data directory contains a downloaded copy of https://en.wikipedia.org/wiki/World_population_estimates


(https://en.wikipedia.org/wiki/World_population_estimates)

The arguments of read_html specify the file to read and how to interpret the tables in the file. The result,
tables , is a sequence of DataFrame objects; len(tables) reports the length of the sequence.

In [3]:

filename = 'data/World_population_estimates.html'
tables = read_html(filename, header=0, index_col=0, decimal='M')
len(tables)

Out[3]:

We can select the DataFrame we want using the bracket operator. The tables are numbered from 0, so
tables[2] is actually the third table on the page.

head selects the header and the first five rows.

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 1/12
7/4/2020 chap05soln - Jupyter Notebook

In [4]:

table2 = tables[2]
table2.head()

Out[4]:

United
United Population Nations
States Reference Department
Maddison HYDE (2007) Tanton (1994)
Census Bureau of Economic
(2008)[17] [24] [18] (1
Bureau (1973–2016) and Social
(2017)[28] [15] Affairs (2015)
[16]

Year

1950 2557628654 2.516000e+09 2.525149e+09 2.544000e+09 2.527960e+09 2.400000e+09 2.527

1951 2594939877 NaN 2.572851e+09 2.571663e+09 NaN NaN

1952 2636772306 NaN 2.619292e+09 2.617949e+09 NaN NaN

1953 2682053389 NaN 2.665865e+09 2.665959e+09 NaN NaN

1954 2730228104 NaN 2.713172e+09 2.716927e+09 NaN NaN

tail selects the last five rows.

In [5]:

table2.tail()

Out[5]:

United
United Population Nations
McEvedy
States Reference Department HYDE Tanton Biraben
Maddison & Jones Th
Census Bureau of Economic (2007) (1994) (1980)
(2008)[17] (1978)
Bureau (1973–2016) and Social [24] [18] [19]
[20]
(2017)[28] [15] Affairs (2015)
[16]

Year

2012 7013871313 7.057075e+09 7.080072e+09 NaN NaN NaN NaN NaN

2013 7092128094 7.136796e+09 7.162119e+09 NaN NaN NaN NaN NaN

2014 7169968185 7.238184e+09 7.243784e+09 NaN NaN NaN NaN NaN

2015 7247892788 7.336435e+09 7.349472e+09 NaN NaN NaN NaN NaN

2016 7325996709 7.418152e+09 NaN NaN NaN NaN NaN NaN

Long column names are awkard to work with, but we can replace them with abbreviated names.

In [6]:

table2.columns = ['census', 'prb', 'un', 'maddison',


'hyde', 'tanton', 'biraben', 'mj',
'thomlinson', 'durand', 'clark']

Here's what the DataFrame looks like now.

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 2/12
7/4/2020 chap05soln - Jupyter Notebook

In [7]:

table2.head()

Out[7]:

census prb un maddison hyde tanton

Year

1950 2557628654 2.516000e+09 2.525149e+09 2.544000e+09 2.527960e+09 2.400000e+09 2.527

1951 2594939877 NaN 2.572851e+09 2.571663e+09 NaN NaN

1952 2636772306 NaN 2.619292e+09 2.617949e+09 NaN NaN

1953 2682053389 NaN 2.665865e+09 2.665959e+09 NaN NaN

1954 2730228104 NaN 2.713172e+09 2.716927e+09 NaN NaN

The first column, which is labeled Year , is special. It is the index for this DataFrame , which means it
contains the labels for the rows.

Some of the values use scientific notation; for example, 2.544000e+09 is shorthand for 2.544 ⋅ 109 or
2.544 billion.

NaN is a special value that indicates missing data.

Series
We can use dot notation to select a column from a DataFrame . The result is a Series , which is like a
DataFrame with a single column.

In [8]:

census = table2.census
census.head()

Out[8]:

Year
1950 2557628654
1951 2594939877
1952 2636772306
1953 2682053389
1954 2730228104
Name: census, dtype: int64

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 3/12
7/4/2020 chap05soln - Jupyter Notebook

In [9]:

census.tail()

Out[9]:

Year
2012 7013871313
2013 7092128094
2014 7169968185
2015 7247892788
2016 7325996709
Name: census, dtype: int64

Like a DataFrame , a Series contains an index, which labels the rows.

1e9 is scientific notation for 1 ⋅ 109 or 1 billion.


From here on, we will work in units of billions.

In [10]:

un = table2.un / 1e9
un.head()

Out[10]:

Year
1950 2.525149
1951 2.572851
1952 2.619292
1953 2.665865
1954 2.713172
Name: un, dtype: float64

In [11]:

census = table2.census / 1e9


census.head()

Out[11]:

Year
1950 2.557629
1951 2.594940
1952 2.636772
1953 2.682053
1954 2.730228
Name: census, dtype: float64

Here's what these estimates look like.

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 4/12
7/4/2020 chap05soln - Jupyter Notebook

In [12]:

plot(census, ':', label='US Census')


plot(un, '--', label='UN DESA')

decorate(xlabel='Year',
ylabel='World population (billion)')

savefig('figs/chap05-fig01.pdf')

Saving figure to file figs/chap05-fig01.pdf

The following expression computes the elementwise differences between the two series, then divides through
by the UN value to produce relative errors (https://en.wikipedia.org/wiki/Approximation_error), then finds the
largest element.

So the largest relative error between the estimates is about 1.3%.

In [13]:

max(abs(census - un) / un) * 100

Out[13]:

1.3821293828998855

Exercise: Break down that expression into smaller steps and display the intermediate results, to make sure
you understand how it works.

1. Compute the elementwise differences, census - un


2. Compute the absolute differences, abs(census - un)
3. Compute the relative differences, abs(census - un) / un
4. Compute the percent differences, abs(census - un) / un * 100
localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 5/12
7/4/2020 chap05soln - Jupyter Notebook

In [14]:

# Solution

census - un

Out[14]:

Year
1950 0.032480
1951 0.022089
1952 0.017480
1953 0.016188
1954 0.017056
1955 0.020448
1956 0.023728
1957 0.028307
1958 0.032107
1959 0.030321
1960 0.016999
1961 0.001137
1962 -0.000978
1963 0.008650
1964 0.017462
1965 0.021303
1966 0.023203

In [15]:

# Solution

abs(census - un)

Out[15]:

Year
1950 0.032480
1951 0.022089
1952 0.017480
1953 0.016188
1954 0.017056
1955 0.020448
1956 0.023728
1957 0.028307
1958 0.032107
1959 0.030321
1960 0.016999
1961 0.001137
1962 0.000978
1963 0.008650
1964 0.017462
1965 0.021303
1966 0.023203

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 6/12
7/4/2020 chap05soln - Jupyter Notebook

In [16]:

# Solution

abs(census - un) / un

Out[16]:

Year
1950 0.012862
1951 0.008585
1952 0.006674
1953 0.006072
1954 0.006286
1955 0.007404
1956 0.008439
1957 0.009887
1958 0.011011
1959 0.010208
1960 0.005617
1961 0.000369
1962 0.000311
1963 0.002702
1964 0.005350
1965 0.006399
1966 0.006829

In [17]:

# Solution

max(abs(census - un) / census) * 100

Out[17]:

1.4014999251669376

max and abs are built-in functions provided by Python, but NumPy also provides version that are a little
more general. When you import modsim , you get the NumPy versions of these functions.

Constant growth

We can select a value from a Series using bracket notation. Here's the first element:

In [18]:

census[1950]

Out[18]:

2.557628654

And the last value.

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 7/12
7/4/2020 chap05soln - Jupyter Notebook

In [19]:

census[2016]

Out[19]:

7.325996709

But rather than "hard code" those dates, we can get the first and last labels from the Series :

In [20]:

t_0 = get_first_label(census)

Out[20]:

1950

In [21]:

t_end = get_last_label(census)

Out[21]:

2016

In [22]:

elapsed_time = t_end - t_0

Out[22]:

66

And we can get the first and last values:

In [23]:

p_0 = get_first_value(census)

Out[23]:

2.557628654

In [24]:

p_end = get_last_value(census)

Out[24]:

7.325996709

Then we can compute the average annual growth in billions of people per year.

In [25]:

total_growth = p_end - p_0

Out[25]:

4.768368055

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 8/12
7/4/2020 chap05soln - Jupyter Notebook

In [26]:

annual_growth = total_growth / elapsed_time

Out[26]:

0.07224800083333333

TimeSeries

Now let's create a TimeSeries to contain values generated by a linear growth model.

In [27]:

results = TimeSeries()

Out[27]:

values

Initially the TimeSeries is empty, but we can initialize it so the starting value, in 1950, is the 1950 population
estimated by the US Census.

In [28]:

results[t_0] = census[t_0]
results

Out[28]:

values

1950 2.557629

After that, the population in the model grows by a constant amount each year.

In [29]:

for t in linrange(t_0, t_end):


results[t+1] = results[t] + annual_growth

Here's what the results looks like, compared to the actual data.

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 9/12
7/4/2020 chap05soln - Jupyter Notebook

In [30]:

plot(census, ':', label='US Census')


plot(un, '--', label='UN DESA')
plot(results, color='gray', label='model')

decorate(xlabel='Year',
ylabel='World population (billion)',
title='Constant growth')

savefig('figs/chap05-fig02.pdf')

Saving figure to file figs/chap05-fig02.pdf

The model fits the data pretty well after 1990, but not so well before.

Exercises
Optional Exercise: Try fitting the model using data from 1970 to the present, and see if that does a better job.

Hint:

1. Copy the code from above and make a few changes. Test your code after each small change.
2. Make sure your TimeSeries starts in 1950, even though the estimated annual growth is based on later
data.
3. You might want to add a constant to the starting value to match the data better.

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 10/12
7/4/2020 chap05soln - Jupyter Notebook

In [31]:

# Solution

def compute_annual_growth(t_0, t_end):


"""Average annual growth over given period.

t_0: start date


t_end: end_date

returns: average annual growth


"""
elapsed_time = t_end - t_0
p_0 = census[t_0]
p_end = census[t_end]
total_growth = p_end - p_0
annual_growth = total_growth / elapsed_time
return annual_growth

# compute annual growth using data from 1970 to the end


t_0 = 1970
t_end = get_last_label(census)
annual_growth = compute_annual_growth(t_0, t_end)

# Run the simulation over the whole time range.


# I subtract 0.45 from the initial value to shift
# the fitted curve down so it fits the data better.
t_0 = get_first_label(census)
t_end = get_last_label(census)
p_0 = get_first_value(census) - 0.45

# initialize the result


results = TimeSeries()
results[t_0] = p_0

# run the simulation


for t in linrange(t_0, t_end):
results[t+1] = results[t] + annual_growth

# plot the results


plot(census, ':', label='US Census')
plot(un, '--', label='UN DESA')
plot(results, '--', color='gray', label='model')

decorate(xlabel='Year',
ylabel='World population (billion)',
title='Constant growth')

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 11/12
7/4/2020 chap05soln - Jupyter Notebook

In [32]:

census.loc[1960:1970]

Out[32]:

Year
1960 3.043002
1961 3.083967
1962 3.140093
1963 3.209828
1964 3.281201
1965 3.350426
1966 3.420678
1967 3.490334
1968 3.562314
1969 3.637159
1970 3.712698
Name: census, dtype: float64

In [ ]:

localhost:8888/notebooks/unphu/simulacion/exercises/ModSimPy/soln/chap05soln.ipynb 12/12

You might also like