Intermediate Python
Intermediate Python
print() the last item from both the year and the pop list to see what the
predicted population for the year 2100 is. Use two print() functions. 2095
Before you can start, you should
import matplotlib.pyplot as plt. pyplot is a sub-package of matplotlib, Line plot (3)
hence the dot.
Use plt.plot() to build a line plot. year should be mapped on the Print the last item from both the list gdp_cap, and the list life_exp; it
horizontal axis, pop on the vertical axis. Don’t forget to finish off is information about Zimbabwe.
with the plt.show() function to actually display the plot. Build a line chart, with gdp_cap on the x-axis, and life_exp on the y-
axis. Does it make sense to plot this data on a line plot?
Don’t forget to finish off with a plt.show() command, to actually
# edited/addedimport numpy as np display the plot.
year=list(range(1950,2100+1))
pop=list(np.loadtxt('pop1.txt', dtype=float)) # edited/added
# Print the last item from year and pop gdp_cap=list(np.loadtxt('gdp_cap.txt', dtype=float))
print(year[-1]) life_exp=list(np.loadtxt('life_exp.txt', dtype=float))
print(pop[-1]) # Print the last item of gdp_cap and life_exp
# Import matplotlib.pyplot as pltimport matplotlib.pyplot as plt print(gdp_cap[-1])
# Make a line plot: year on the x-axis, pop on the y-axis print(life_exp[-1])
plt.plot(year, pop) # Make a line plot, gdp_cap on the x-axis, life_exp on the y-axis
# Display the plot with plt.show() plt.plot(gdp_cap, life_exp)
plt.show() # Display the plot
plt.show()
Line Plot (2): Interpretation
Have another look at the plot you created in the previous exercise; it’s shown Scatter Plot (1)
on the right. Based on the plot, in approximately what year will there be
more than ten billion human beings on this planet? Change the line plot that’s coded in the script to a scatter plot.
A correlation will become clear when you display the GDP per capita
on a logarithmic scale. Add the line plt.xscale('log').
2040
Finish off your script with plt.show() to display the plot. Add plt.show() to actually display the histogram. Can you tell which
bin contains the most observations?
# Change the line plot below to a scatter plot
plt.scatter(gdp_cap, life_exp) # Create histogram of life_exp data
Scatter plot (2) Build a histogram of life_exp, with 5 bins. Can you tell which bin
contains the most observations?
Start from scratch: import matplotlib.pyplot as plt. Build another histogram of life_exp, this time with 20 bins. Is this
Build a scatter plot, where pop is mapped on the horizontal axis, better?
and life_exp is mapped on the vertical axis.
Finish the script with plt.show() to actually display the plot. Do you
# Build histogram with 5 bins
see a correlation?
plt.hist(life_exp, bins = 5)
# edited/added # Show and clear plot
pop=list(np.loadtxt('pop2.txt', dtype=float)) plt.show()
# Import packageimport matplotlib.pyplot as plt plt.clf()
# Build Scatter plot # Build histogram with 20 bins
plt.scatter(pop, life_exp) plt.hist(life_exp, bins = 20)
# Show plot # Show and clear plot again
plt.show() plt.show()
plt.clf()
Build a histogram (1)
Build a histogram (3): compare
Use plt.hist() to create a histogram of the values in life_exp. Do not
specify the number of bins; Python will set the number of bins to 10 Build a histogram of life_exp with 15 bins.
by default for you. Build a histogram of life_exp1950, also with 15 bins. Is there a big
difference with the histogram for the 2007 data?
# edited/added Line plot
life_exp1950=list(np.loadtxt('life_exp1950.txt', dtype=float))
# Histogram of life_exp, 15 bins Scatter plot
Ticks
Use tick_val and tick_lab as inputs to the xticks() function to make o Double the values in np_pop setting the value of np_pop equal
the the plot more readable. to np_pop * 2. Because np_pop is a NumPy array, each array
As usual, display the plot with plt.show() after you’ve added the element will be doubled.
customizations. o Change the s argument inside plt.scatter() to
be np_pop instead of pop.
# Scatter plot
plt.scatter(gdp_cap, life_exp) # Import numpy as npimport numpy as np
# Adapt the ticks on the x-axis plt.xlabel('GDP per Capita [in USD]')
# edited/added
col=list(np.loadtxt('col.txt', dtype=str)) plt.text(1550, 71, 'India')
# Specify c and alpha inside plt.scatter() plt.text(5700, 80, 'China')
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0. # Add grid() call
8) plt.grid(True)
# Previous customizations # Show the plot
plt.xscale('log') plt.show()
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]') Interpretation
plt.title('World Development in 2007') If you have a look at your colorful plot, it’s clear that people live longer in
countries with a higher GDP per capita. No high income countries have really
plt.xticks([1000,10000,100000], ['1k','10k','100k']) short life expectancy, and no low income countries have very long life
# Show the plot expectancy. Still, there is a huge difference in life expectancy between
countries on the same income level. Most people live in middle income
plt.show()
countries where difference in lifespan is huge between countries; depending
on how income is distributed and how it is used.
Additional Customizations
What can you say about the plot?
Add plt.grid(True) after the plt.text() calls so that gridlines are drawn
on the plot. The countries in blue, corresponding to Africa,
have both low life expectancy and a low GDP per
# Scatter plot capita.
plt.scatter(x = gdp_cap, y = life_exp, s = np.array(pop) * 2, c = col, alpha = 0.
8) There is a negative correlation between GDP per
capita and life expectancy.
# Previous customizations
plt.xscale('log') China has both a lower GDP per capita and
plt.xlabel('GDP per Capita [in USD]') lower life expectancy compared to India.
plt.ylabel('Life Expectancy [in years]')
Motivation for dictionaries
plt.title('World Development in 2007')
plt.xticks([1000,10000,100000], ['1k','10k','100k']) Use the index() method on countries to find the index of 'germany'.
Store this index as ind_ger.
# Additional customizations
Use ind_ger to access the capital of Germany from the capitals list. # Definition of dictionary
Print it out.
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
# Definition of countries and capital # Print out the keys in europe
countries = ['spain', 'france', 'germany', 'norway'] print(europe.keys())
capitals = ['madrid', 'paris', 'berlin', 'oslo'] # Print out value that belongs to key 'norway'
# Get index of 'germany': ind_ger print(europe['norway'])
ind_ger = countries.index('germany')
Dictionary Manipulation (1)
# Use ind_ger to print out capital of Germany
print(capitals[ind_ger]) Add the key 'italy' with the value 'rome' to europe.
To assert that 'italy' is now a key in europe, print out 'italy' in europe.
Create dictionary Add another key:value pair to europe: 'poland' is the key, 'warsaw' is
the corresponding value.
Print out europe.
With the strings in countries and capitals, create a dictionary
called europe with 4 key:value pairs. Beware of capitalization! Make
sure you use lowercase characters everywhere. # Definition of dictionary
Print out europe to see if the result is what you expected. europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }
# Add italy to europe
# Definition of countries and capital
europe['italy'] = 'rome'
countries = ['spain', 'france', 'germany', 'norway']
# Print out italy in europe
capitals = ['madrid', 'paris', 'berlin', 'oslo']
print('italy' in europe)
# From string in countries and capitals, create dictionary europe
# Add poland to europe
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo'}
europe['poland'] = 'warsaw'
# Print europe
# Print europe
print(europe)
print(europe)
Access dictionary
Dictionary Manipulation (2)
Check out which keys are in europe by calling the keys() method
on europe. Print out the result. The capital of Germany is not 'bonn'; it’s 'berlin'. Update its value.
Print out the value that belongs to the key 'norway'.
Australia is not in Europe, Austria is! Remove the # Print out the capital of France
key 'australia' from europe.
Print out europe to see if your cleaning work paid off. print(europe['france']['capital'])
# Create sub-dictionary data
# Definition of dictionary data = { 'capital':'rome', 'population':59.83 }
europe = {'spain':'madrid', 'france':'paris', 'germany':'bonn', # Add data to europe under key 'italy'
'norway':'oslo', 'italy':'rome', 'poland':'warsaw', europe['italy'] = data
'australia':'vienna' } # Print europe
# Update capital of germany print(europe)
europe['germany'] = 'berlin'
# Remove australiadel(europe['australia']) Dictionary to DataFrame (1)
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egyp Run the code with Run Code and assert that the first column should
t'] actually be used as row labels.
dr = [True, False, False, False, True, True, True] Specify the index_col argument inside pd.read_csv(): set it to 0, so
that the first column is used as row labels.
cpc = [809, 731, 588, 18, 200, 70, 45] Has the printout of cars improved now?
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(cars_dict) # Import pandas as pdimport pandas as pd
print(cars) # Fix import by including index_col
# Definition of row_labels cars = pd.read_csv('cars.csv', index_col = 0)
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG'] # Print out cars
# Specify row labels of cars print(cars)
cars.index = row_labels
Square Brackets (1)
# Print cars again
print(cars) Use single square brackets to print out the country column of cars as a
Pandas Series.
Use double square brackets to print out the country column of cars as
CSV to DataFrame (1)
a Pandas DataFrame.
Use double square brackets to print out a DataFrame with both Use loc or iloc to select the observations for Australia and Egypt as a
the country and drives_right columns of cars, in this order. DataFrame. You can find out about the labels/indexes of these rows
by inspecting cars in the IPython Shell. Make sure to print the
# Import cars dataimport pandas as pd resulting DataFrame.
Use loc or iloc to select the observation corresponding to Japan as a Print out the drives_right column as a Series using loc or iloc.
Series. The label of this row is JPN, the index is 2. Make sure to print Print out the drives_right column as a DataFrame using loc or iloc.
the resulting Series. Print out both the cars_per_cap and drives_right column as a
DataFrame using loc or iloc.
# Import cars dataimport pandas as pd o x is greater than or equal to -10. x has already been defined for
you.
cars = pd.read_csv('cars.csv', index_col = 0) o "test" is less than or equal to y. y has already been defined for
# Print out drives_right column as Series you.
o True is greater than False.
print(cars.iloc[:, 2])
# Print out drives_right column as DataFrame # Comparison of integers
print(cars.iloc[:, [2]]) x = -3 * 6
# Print out cars_per_cap and drives_right as DataFrame print(x >= -10)
print(cars.loc[:, ['cars_per_cap', 'drives_right']]) # Comparison of strings
y = "test"
Equality
print("test" <= y)
In the editor on the right, write code to see if True equals False. # Comparison of booleans
Write Python code to check if -5 * 15 is not equal to 75.
Ask Python whether the strings "pyscript" and "PyScript" are equal. print(True > False)
What happens if you compare booleans and integers? Write code to
see if True and 1 are equal. Compare arrays
# Comparison of booleans Which areas in my_house are greater than or equal to 18?
You can also compare two NumPy arrays element-wise. Which areas
print(True == False) in my_house are smaller than the ones in your_house?
# Comparison of integers Make sure to wrap both commands in a print() statement so that you
can inspect the output!
print(-5 * 15 != 75)
# Comparison of strings # Create arraysimport numpy as np
print("pyscript" == "PyScript") my_house = np.array([18.0, 20.0, 10.75, 9.50])
# Compare a boolean with a numeric your_house = np.array([14.0, 24.0, 14.25, 9.0])
print(True == 1) # my_house greater than or equal to 18
print(my_house >= 18)
Greater and less than
# my_house less than your_house
Write Python expressions, wrapped in a print() function, to check print(my_house < your_house)
whether:
Boolean Operators
True
and, or, not (1)
False
Write Python expressions, wrapped in a print() function, to check
whether:
Running these commands will result in an error.
o my_kitchen is bigger than 10 and smaller than 18.
Boolean operators with NumPy
o my_kitchen is smaller than 14 or bigger than 17.
o double the area of my_kitchen is smaller than triple the area
Generate boolean arrays that answer the following questions:
of your_kitchen.
Which areas in my_house are greater than 18.5 or smaller than 10?
Which areas are smaller than 11 in both my_house and your_house?
# Define variables Make sure to wrap both commands in print() statement, so that you
my_kitchen = 18.0 can inspect the output.
your_kitchen = 14.0
# Create arraysimport numpy as np
# my_kitchen bigger than 10 and smaller than 18?
my_house = np.array([18.0, 20.0, 10.75, 9.50])
print(my_kitchen > 10 and my_kitchen < 18)
your_house = np.array([14.0, 24.0, 14.25, 9.0])
# my_kitchen smaller than 14 or bigger than 17?
# my_house greater than 18.5 or smaller than 10
print(my_kitchen < 14 or my_kitchen > 17)
print(np.logical_or(my_house > 18.5, my_house < 10))
# Double my_kitchen smaller than triple your_kitchen?
# Both my_house and your_house smaller than 11
print(my_kitchen * 2 < your_kitchen * 3)
print(np.logical_and(my_house < 11, your_house < 11))
and, or, not (2)
if, elif, else
x=8
Warmup
y=9
To experiment with if and else a bit, have a look at this code sample:
not(not(x < 3) and not(y > 14 or y > 10))
area = 10.0
What will the result be if you execute these three commands in the IPython
Shell? if(area < 9) :
NB: Notice that not has a higher priority than and and or, it is executed first. print("small")
elif(area < 12) :
print("medium") # Define variables
else : room = "kit"
print("large") area = 14.0
What will the output be if you run this piece of code in the IPython Shell? # if-else construct for roomif room == "kit" :
print("looking around in the kitchen.")else :
small print("looking around elsewhere.")
# if-else construct for area :if area > 15 :
medium print("big place!")else :
print("pretty small.")
large
Customize further: elif
The syntax is incorrect; this code will produce an error.
# Define variables
if
room = "bed"
Examine the if statement that prints out "looking around in the area = 14.0
kitchen." if room equals "kit". # if-elif-else construct for roomif room == "kit" :
Write another if statement that prints out “big place!” if area is greater
than 15. print("looking around in the kitchen.")elif room == "bed":
print("looking around in the bedroom.")else :
# Define variables print("looking around elsewhere.")
room = "kit" # if-elif-else construct for areaif area > 15 :
area = 14.0 print("big place!")elif area > 10 :
# if statement for roomif room == "kit" : print("medium size, nice!")else :
print("looking around in the kitchen.") print("pretty small.")
# if statement for areaif area > 15 :
print("big place!") Driving right (1)
Driving right (2) Use the code sample provided to create a DataFrame medium, that
includes all the observations of cars that have
# Import cars dataimport pandas as pd a cars_per_cap between 100 and 500.
cars = pd.read_csv('cars.csv', index_col = 0) Print out medium.
# Convert code to a one-liner
# Import cars dataimport pandas as pd
sel = cars[cars['drives_right']]
cars = pd.read_csv('cars.csv', index_col = 0)
# Print sel
# Import numpy, you'll need thisimport numpy as np
print(sel)
# Create medium: observations with cars_per_cap between 100 and 500
Cars per capita (1) cpc = cars['cars_per_cap']
between = np.logical_and(cpc > 100, cpc < 500)
Select the cars_per_cap column from cars as a Pandas Series and store
it as cpc. medium = cars[between]
Use cpc in combination with a comparison operator and 500. You # Print medium
want to end up with a boolean Series that’s True if the corresponding
country has a cars_per_cap of more than 500 and False otherwise. print(medium)
Store this boolean Series as many_cars.
Use many_cars to subset cars, similar to what you did before. Store while: warming up
the result as car_maniac. Can you tell how many printouts the following while loop will do?
Print out car_maniac to see if you got it right.
x=1 offset = offset - 1
while x < 4 : print(offset)
print(x)
Add conditionals
x=x+1
Inside the while loop, complete the if-else statement:
0
o If offset is greater than zero, you should decrease offset by 1.
o Else, you should increase offset by 1.
1
If you’ve coded things correctly, hitting Submit Answer should work
2 this time.
3 # Initialize offset
offset = -6
4 # Code the while loopwhile offset != 0 :
print("correcting...")
Basic while loop
if offset > 0 :
Create the variable offset with an initial value of 8. offset = offset - 1
Code a while loop that keeps running as long as offset is not equal
to 0. Inside the while loop: else :
offset = offset + 1
o Print out the sentence "correcting...".
o Next, decrease the value of offset by 1. You can do this print(offset)
with offset = offset - 1.
o Finally, still within your loop, print out offset so you can see Loop over a list
how it changes. Write a for loop that iterates over all elements of the areas list and prints out
every element separately.
# Initialize offset
# areas list
offset = 8
areas = [11.25, 18.0, 20.0, 10.75, 9.50]
# Code the while loopwhile offset != 0 :
# Code the for loopfor area in areas :
print("correcting...")
print(area) # house list of lists
house = [["hallway", 11.25],
Indexes and values (1)
["kitchen", 18.0],
Adapt the for loop in the sample code to use enumerate() and use two ["living room", 20.0],
iterator variables.
["bedroom", 10.75],
Update the print() statement so that on each run, a line of the
form "room x: y" should be printed, where x is the index of the list ["bathroom", 9.50]]
element and y is the actual list element, i.e. the area. Make sure to # Build a for loop from scratchfor x in house :
print out this exact string, with the correct spacing.
print("the " + x[0] + " is " + str(x[1]) + " sqm")
# areas list
Loop over dictionary
areas = [11.25, 18.0, 20.0, 10.75, 9.50]
# Change for loop to use enumerate() and update print()for index, area in en # Definition of dictionary
umerate(areas) : europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
print("room " + str(index) + ": " + str(area)) 'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
# Iterate over europefor key, value in europe.items() :
Indexes and values (2)
print("the capital of " + str(key) + " is " + str(value))
For non-programmer folks, room 0: 11.25 is strange. Wouldn’t it be better if
the count started at 1?
Loop over NumPy array
Adapt the print() function in the for loop so that the first printout
becomes "room 1: 11.25", the second one "room 2: 18.0" and so on. Import the numpy package under the local alias np.
Write a for loop that iterates over all elements in np_height and prints
# areas list out "x inches" for each element, where x is the value in the array.
areas = [11.25, 18.0, 20.0, 10.75, 9.50] Write a for loop that visits every element of the np_baseball array and
prints it out.
# Adapt the printoutfor index, area in enumerate(areas) :
print("room " + str(index + 1) + ": " + str(area)) # edited/addedimport pandas as pd
mlb = pd.read_csv('baseball.csv')
Loop over list of lists
np_height = np.array(mlb['Height'])
Write a for loop that goes through each sublist of house and prints out the x is
y sqm, where x is the name of the room and y is the area of the room. np_weight = np.array(mlb['Weight'])
baseball = [[180, 78.4],
[215, 102.7], # Import cars dataimport pandas as pd
[210, 98.5], cars = pd.read_csv('cars.csv', index_col = 0)
[188, 75.2]] # Adapt for loopfor lab, row in cars.iterrows() :
np_baseball = np.array(baseball) print(lab + ": " + str(row['cars_per_cap']))
# Import numpy as npimport numpy as np
Add column (1)
# For loop over np_heightfor x in np_height[:5]: # edited/added
print(str(x) + " inches") Use a for loop to add a new column, named COUNTRY, that contains
a uppercase version of the country names in the "country" column.
# For loop over np_baseballfor x in np.nditer(np_baseball) :
You can use the string method upper() for this.
print(x) To see if your code worked, print out cars. Don’t indent this code, so
that it’s not part of the for loop.
Loop over DataFrame (1)
Write a for loop that iterates over the rows of cars and on each iteration # Import cars dataimport pandas as pd
perform two print() calls: one to print out the row label and one to print out cars = pd.read_csv('cars.csv', index_col = 0)
all of the rows contents.
# Code for loop that adds COUNTRY columnfor lab, row in cars.iterrows() :
# Import cars dataimport pandas as pd cars.loc[lab, "COUNTRY"] = row["country"].upper()
cars = pd.read_csv('cars.csv', index_col = 0) # Print cars
# Iterate over rows of carsfor lab, row in cars.iterrows() : print(cars)
print(lab)
print(row) Add column (2)
Replace the for loop with a one-liner that uses .apply(str.upper). The
Loop over DataFrame (2) call should give the same result: a column COUNTRY should be
added to cars, containing an uppercase version of the country names.
Using the iterators lab and row, adapt the code in the for loop such As usual, print out cars to see the fruits of your hard labor
that the first iteration prints out "US: 809", the second iteration "AUS:
731", and so on.
The output should be in the form "country: cars_per_cap". Make sure # Import cars dataimport pandas as pd
to print out this exact string (with the correct spacing). cars = pd.read_csv('cars.csv', index_col = 0)
o You can use str() to convert your integer data to a string so # Use .apply(str.upper)
that you can print it in conjunction with the country label.
cars["COUNTRY"] = cars["country"].apply(str.upper) print(np.random.randint(1,7))
seed(): sets the random seed, so that your results are reproducible Roll the dice. Use randint() to create the variable dice.
between simulations. As an argument, it takes an integer of your Finish the if-elif-else construct by replacing ___:
choosing. If you call the function, no output will be generated. If dice is 1 or 2, you go one step down.
rand(): if you don’t specify any arguments, it generates a random float if dice is 3, 4 or 5, you go one step up.
between zero and one. Else, you throw the dice again. The number of eyes is the number of
steps you go up.
Import numpy as np. Print out dice and step. Given the value of dice, was step updated
Use seed() to set the seed; as an argument, pass 123. correctly?
Generate your first random float with rand() and print it out.
# NumPy is imported, seed is set
# Import numpy as npimport numpy as np # Starting step
# Set the seed step = 50
np.random.seed(123) # Roll the dice
# Generate and print random float dice = np.random.randint(1,7)
print(np.random.rand()) # Finish the control constructif dice <= 2 :
Import matplotlib.pyplot as plt. Fill in the specification of the for loop so that the random walk is
Use plt.plot() to plot random_walk. simulated 10 times.
Finish off with plt.show() to actually display the plot. After the random_walk array is entirely populated, append the array
to the all_walks list.
# NumPy is imported, seed is set Finally, after the top-level for loop, print out all_walks.
# Initialization
# NumPy is imported; seed is set
random_walk = [0]
# Initialize all_walks (don't change this line)
for x in range(100) :
all_walks = []
step = random_walk[-1]
# Simulate random walk 10 timesfor i in range(10) :
dice = np.random.randint(1,7)
48.8%
76.6%
78.4%
95.9%