0% found this document useful (0 votes)
44 views14 pages

Python Data Science Toolbox

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views14 pages

Python Data Science Toolbox

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

PYTHON DATA SCIENCE TOOLBOX (PART 2) print(next(superhero))

Iterators vs. Iterables print(next(superhero))


The environment has been pre-loaded with the variables flash1 and flash2. print(next(superhero))
Try printing out their values with print() and next() to figure out which is
an iterable and which is an iterator. Iterating over iterables (2)

 Create an iterator object small_value over range(3) using the


Both flash1 and flash2 are iterators. function iter().
 Using a for loop, iterate over range(3), printing the value for every
iteration. Use num as the loop variable.
Both flash1 and flash2 are iterables.  Create an iterator object googol over range(10 ** 100).

# Create an iterator for range(3): small_value


flash1 is an iterable and flash2 is an iterator. small_value = iter(range(3))

Iterating over iterables (1) # Print the values in small_value


print(next(small_value))
 Create a for loop to loop over flash and print the values in the list.
print(next(small_value))
Use person as the loop variable.
 Create an iterator for the list flash and assign the result to superhero. print(next(small_value))
 Print each of the items from superhero using next() 4 times. # Loop over range(3) and print the valuesfor num in range(3):
print(num)
# Create a list of strings: flash
# Create an iterator for range(10 ** 100): googol
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']
googol = iter(range(10 ** 100))
# Print each list item in flash using a for loopfor person in flash:
# Print the first 5 values from googol
print(person)
print(next(googol))
# Create an iterator for flash: superhero
print(next(googol))
superhero = iter(flash)
print(next(googol))
# Print each item from the iterator
print(next(googol))
print(next(superhero))
print(next(googol))  Complete the first for loop by unpacking the tuples generated by
calling enumerate() on mutants. Use index1 for the index
and value1 for the value when unpacking the tuple.
Iterators as function arguments  Complete the second for loop similarly as with the first, but this time
change the starting index to start from 1 by passing it in as an
 Create a range object that would produce the values from 10 to 20 argument to the start parameter of enumerate(). Use index2 for the
using range(). Assign the result to values. index and value2 for the value when unpacking the tuple.
 Use the list() function to create a list of values from the range
object values. Assign the result to values_list.
 Use the sum() function to get the sum of the values from 10 to 20 # Create a list of strings: mutants
from the range object values. Assign the result to values_sum. mutants = ['charles xavier',
'bobby drake',
# Create a range object: values
'kurt wagner',
values = range(10, 21)
'max eisenhardt',
# Print the range object
'kitty pryde']
print(values)
# Create a list of tuples: mutant_list
# Create a list of integers: values_list
mutant_list = list(enumerate(mutants))
values_list = list(values)
# Print the list of tuples
# Print values_list
print(mutant_list)
print(values_list)
# Unpack and print the tuple pairsfor index1, value1 in enumerate(mutants):
# Get the sum of values: values_sum
print(index1, value1)
values_sum = sum(values)
# Change the start indexfor index2, value2 in enumerate(mutants, start=1):
# Print values_sum
print(index2, value2)
print(values_sum)
Using zip
Using enumerate
 Using zip() with list(), create a list of tuples from the three
 Create a list of tuples from mutants and assign the result lists mutants, aliases, and powers (in that order) and assign the result
to mutant_list. Make sure you generate the tuples to mutant_data.
using enumerate() and turn the result from it into a list using list().  Using zip(), create a zip object called mutant_zip from the three
lists mutants, aliases, and powers.
 Complete the for loop by unpacking the zip object you created and  The last print() statements prints the output of
printing the tuple values. Use value1, value2, value3 for the values comparing result1 to mutants and result2 to powers. Click Submit
from each of mutants, aliases, and powers, in that order. Answer to see if the unpacked result1 and result2 are equivalent
to mutants and powers, respectively.
# edited/added
# Create a zip object from mutants and powers: z1
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
z1 = zip(mutants, powers)
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis',
'intangibility'] # Print the tuples in z1 by unpacking with *
# Create a list of tuples: mutant_data print(*z1)
mutant_data = list(zip(mutants, aliases, powers)) # Re-create a zip object from mutants and powers: z1
# Print the list of tuples z1 = zip(mutants, powers)
print(mutant_data) # 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
# Create a zip object using the three lists: mutant_zip result1, result2 = zip(*z1)
mutant_zip = zip(mutants, aliases, powers) # Check if unpacked tuples are equivalent to original tuples
# Print the zip object print(result1 == mutants)
print(mutant_zip) print(result2 == powers)
# Unpack the zip object and print the tuple valuesfor value1, value2, value3
in mutant_zip: Processing large amounts of Twitter data
print(value1, value2, value3)  Initialize an empty dictionary counts_dict for storing the results of
processing the Twitter data.
Using * and zip to ‘unzip’  Iterate over the 'tweets.csv' file by using a for loop. Use the loop
variable chunk and iterate over the call to pd.read_csv() with
 Create a zip object by using zip() on mutants and powers, in that a chunksize of 10.
order. Assign the result to z1.  In the inner loop, iterate over the column 'lang' in chunk by using
 Print the tuples in z1 by unpacking them into positional arguments a for loop. Use the loop variable entry.
using the * operator in a print() call.
 Because the previous print() call would have exhausted the elements
# edited/addedimport pandas as pd
in z1, recreate the zip object you defined earlier and assign the result
again to z1. # Initialize an empty dictionary: counts_dict
 ‘Unzip’ the tuples in z1 by unpacking them into positional arguments counts_dict = {}
using the * operator in a zip() call. Assign the results
to result1 and result2, in that order.
# Iterate over the file chunk by chunkfor chunk in pd.read_csv('tweets.csv',
chunksize=10): # Initialize an empty dictionary: counts_dict
counts_dict = {}
# Iterate over the column in DataFrame
for entry in chunk['lang']: # Iterate over the file chunk by chunk
if entry in counts_dict.keys(): for chunk in pd.read_csv(csv_file, chunksize=c_size):
counts_dict[entry] += 1
else: # Iterate over the column in DataFrame
counts_dict[entry] = 1 for entry in chunk[colname]:
# Print the populated dictionary if entry in counts_dict.keys():
print(counts_dict) counts_dict[entry] += 1

Extracting information for large amounts of Twitter data else:


counts_dict[entry] = 1
 Define the function count_entries(), which has 3 parameters. The first
parameter is csv_file for the filename, the second is c_size for the
chunk size, and the last is colname for the column name. # Return counts_dict
 Iterate over the file in csv_file file by using a for loop. Use the loop return counts_dict
variable chunk and iterate over the call to pd.read_csv(),
passing c_size to chunksize. # Call count_entries(): result_counts
 In the inner loop, iterate over the column given result_counts = count_entries('tweets.csv', 10, 'lang')
by colname in chunk by using a for loop. Use the loop variable entry.
 Call the count_entries() function by passing to it the # Print result_counts
filename 'tweets.csv', the size of chunks 10, and the name of the print(result_counts)
column to count, 'lang'. Assign the result of the call to the
variable result_counts. Write a basic list comprehension
The following list has been pre-loaded in the environment.
# Define count_entries()def count_entries(csv_file, c_size, colname):
"""Return a dictionary with counts of doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']
occurrences as value for each key."""
How would a list comprehension that produces a list of the first character of valjean = 24601
each string in doctor look like? Note that the list comprehension uses doc as
the iterator variable. What will the output be?
You can build list comprehensions over all the objects except the string of
The list comprehension is [for doc in doctor: doc[0]] and produces the number characters jean.
list ['h', 'c', 'c', 't', 'w'].

You can build list comprehensions over all the objects except the string
The list comprehension is [doc[0] for doc in doctor] and produces the lists doctor and flash.
list ['h', 'c', 'c', 't', 'w'].

You can build list comprehensions over all the objects except range(50).
The list comprehension is [doc[0] in doctor] and produces the list ['h', 'c',
'c', 't', 'w'].
You can build list comprehensions over all the objects except the integer
List comprehension over iterables object valjean.
You know that list comprehensions can be built over iterables. Given the
following objects below, which of these can we build list comprehensions Writing list comprehensions
over?
 Using the range of numbers from 0 to 9 as your iterable and i as your
doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson'] iterator variable, write a list comprehension that produces a list of
numbers consisting of the squared values of i.

range(50)
# Create list comprehension: squares
squares = [i**2 for i in range(0,10)]
underwood = 'After all, we are nothing more or less than what we choose to
reveal.' Nested list comprehensions

 In the inner list comprehension - that is, the output expression of the
jean = '24601'
nested list comprehension - create a list of values
from 0 to 4 using range(). Use col as the iterator variable.
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']  In the iterable part of your nested list comprehension, use range() to
count 5 rows - that is, create a list of values from 0 to 4. Use row as
the iterator variable; note that you won’t be needing this variable to new_fellowship = [member if len(member) >= 7 else '' for member in
create values in the list of lists. fellowship]
# Print the new list
# Create a 5 x 5 matrix using a list of lists: matrix
print(new_fellowship)
matrix = [[col for col in range(5)] for row in range(5)]
# Print the matrixfor row in matrix: Dict comprehensions
print(row) Create a dict comprehension where the key is a string in fellowship and the
value is the length of the string. Remember to use the syntax <key> :
Using conditionals in comprehensions (1) <value> in the output expression part of the comprehension to create the
members of the dictionary. Use member as the iterator variable.
 Use member as the iterator variable in the list comprehension. For the
conditional, use len() to evaluate the iterator variable. Note that you # Create a list of strings: fellowship
only want strings with 7 characters or more. fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create dict comprehension: new_fellowship
# Create a list of strings: fellowship
new_fellowship = { member:len(member) for member in fellowship }
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Print the new dictionary
# Create list comprehension: new_fellowship
print(new_fellowship)
new_fellowship = [member for member in fellowship if len(member) >= 7]
# Print the new list List comprehensions vs. generators
print(new_fellowship) To help with that task, the following code has been pre-loaded in the
environment:
Using conditionals in comprehensions (2)
# List of strings
 In the output expression, keep the string as-is if the number of fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
characters is >= 7, else replace it with an empty string - that is, '' or "".

# List comprehension
# Create a list of strings: fellowship
fellow1 = [member for member in fellowship if len(member) >= 7]
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create list comprehension: new_fellowship
# Generator expression
fellow2 = (member for member in fellowship if len(member) >= 7) print(next(result))

Try to play around with fellow1 and fellow2 by figuring out their types and print(next(result))
printing out their values. Based on your observations and what you can recall print(next(result))
from the video, select from the options below the best description for the
# Print the rest of the valuesfor value in result:
difference between list comprehensions and generators.
print(value)

List comprehensions and generators are not different at all; they are just Changing the output in generator expressions
different ways of writing the same thing.
 Write a generator expression that will generate the lengths of each
string in lannister. Use person as the iterator variable. Assign the
A list comprehension produces a list as output, a generator produces a result to lengths.
generator object.  Supply the correct iterable in the for loop for printing the values in the
generator object.

A list comprehension produces a list as output that can be iterated over, a # Create a list of strings: lannister
generator produces a generator object that can’t be iterated over. lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

Write your own generator expressions # Create a generator object: lengths


lengths = (len(person) for person in lannister)
 Create a generator object that will produce values from 0 to 30.
# Iterate over and print the values in lengthsfor value in lengths:
Assign the result to result and use num as the iterator variable in the
generator expression. print(value)
 Print the first 5 values by using next() appropriately in print().
 Print the rest of the values by using a for loop to iterate over the Build a generator
generator object.
 Complete the function header for the function get_lengths() that has a
# Create generator object: result single parameter, input_list.
 In the for loop in the function definition, yield the length of the strings
result = (num for num in range(31)) in input_list.
# Print the first 5 values  Complete the iterable part of the for loop for printing the values
generated by the get_lengths() generator function. Supply the call
print(next(result))
to get_lengths(), passing in the list lannister.
print(next(result))
# Create a list of strings tweet_clock_time = [entry[11:19] for entry in tweet_time]
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey'] # Print the extracted times
# Define generator function get_lengthsdef get_lengths(input_list): print(tweet_clock_time)
"""Generator function that yields the
Conditional list comprehensions for time-stamped data
length of the strings in input_list."""
 Extract the column 'created_at' from df and assign the result
to tweet_time.
# Yield the length of a string
 Create a list comprehension that extracts the time from each row
for person in input_list: in tweet_time. Each row is a string that represents a timestamp, and
yield len(person) you will access the 12th to 19th characters in the string to extract the
time. Use entry as the iterator variable and assign the result
# Print the values generated by get_lengths()for value in to tweet_clock_time. Additionally, add a conditional expression that
get_lengths(lannister): checks whether entry[17:19] is equal to '19'.
print(value)
# Extract the created_at column from df: tweet_time
List comprehensions for time-stamped data tweet_time = df['created_at']
 Extract the column 'created_at' from df and assign the result # Extract the clock time: tweet_clock_time
to tweet_time. Fun fact: the extracted column in tweet_time here is a tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] ==
Series data structure! '19']
 Create a list comprehension that extracts the time from each row
in tweet_time. Each row is a string that represents a timestamp, and # Print the extracted times
you will access the 12th to 19th characters in the string to extract the print(tweet_clock_time)
time. Use entry as the iterator variable and assign the result
to tweet_clock_time. Remember that Python uses 0-based indexing! Dictionaries for data science

# edited/added  Create a zip object by calling zip() and passing to


it feature_names and row_vals. Assign the result to zipped_lists.
df = pd.read_csv('tweets.csv')
 Create a dictionary from the zipped_lists zip object by
# Extract the created_at column from df: tweet_time calling dict() with zipped_lists. Assign the resulting dictionary
tweet_time = df['created_at'] to rs_dict.

# Extract the clock time: tweet_clock_time


# edited/added
feature_names = ['CountryName', 'CountryCode', 'IndicatorName',
'IndicatorCode', 'Year', 'Value'] # Return the dictionary
row_vals = ['Arab World', 'ARB', 'Adolescent fertility rate (births per 1,000 return rs_dict
women ages 15-19)', 'SP.ADO.TFRT', '1960', '133.56090740552298']
# Call lists2dict: rs_fxn
# Zip lists: zipped_lists
rs_fxn = lists2dict(feature_names, row_vals)
zipped_lists = zip(feature_names, row_vals)
# Print rs_fxn
# Create a dictionary: rs_dict
print(rs_fxn)
rs_dict = dict(zipped_lists)
# Print the dictionary Using a list comprehension
print(rs_dict)
 Inspect the contents of row_lists by printing the first two lists
Writing a function to help you in row_lists.
 Create a list comprehension that generates a dictionary
using lists2dict() for each sublist in row_lists. The keys are from
 Define the function lists2dict() with two parameters: first is list1 and
the feature_names list and the values are the row entries in row_lists.
second is list2.
Use sublist as your iterator variable and assign the resulting list of
 Return the resulting dictionary rs_dict in lists2dict().
dictionaries to list_of_dicts.
 Call the lists2dict() function with the
 Look at the first two dictionaries in list_of_dicts by printing them out.
arguments feature_names and row_vals. Assign the result of the
function call to rs_fxn.
# edited/addedimport csvwith open('row_lists.csv', 'r', newline='') as csvfile:
# Define lists2dict()def lists2dict(list1, list2): reader = csv.reader(csvfile)
"""Return a dictionary where list1 provides row_lists = [row for row in reader]
the keys and list2 provides the values.""" # Print the first two lists in row_lists
print(row_lists[0])
# Zip lists: zipped_lists print(row_lists[1])
zipped_lists = zip(list1, list2) # Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]
# Create a dictionary: rs_dict # Print the first two dictionaries in list_of_dicts
rs_dict = dict(zipped_lists) print(list_of_dicts[0])
print(list_of_dicts[1])
# Initialize an empty dictionary: counts_dict
Turning this all into a DataFrame
counts_dict = {}
 To use the DataFrame() function you need, first import the pandas
package with the alias pd.
# Process only the first 1000 rows
 Create a DataFrame from the list of dictionaries in list_of_dicts by
calling pd.DataFrame(). Assign the resulting DataFrame to df. for j in range(0, 1000):
 Inspect the contents of df printing the head of the DataFrame. Head of
the DataFrame df can be accessed by calling df.head().
# Split the current line into a list: line
# Import the pandas packageimport pandas as pd line = file.readline().split(',')
# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists] # Get the value for the first column: first_col
# Turn list of dicts into a DataFrame: df first_col = line[0]
df = pd.DataFrame(list_of_dicts)
# Print the head of the DataFrame # If the column value is in the dict, increment its value
print(df.head()) if first_col in counts_dict.keys():
counts_dict[first_col] += 1
Processing data in chunks (1)

 Use open() to bind the csv file 'world_dev_ind.csv' as file in the # Else, add to the dict and set value to 1
context manager. else:
 Complete the for loop so that it iterates 1000 times to perform the
loop body and process only the first 1000 rows of data of the file. counts_dict[first_col] = 1
# Print the resulting dictionary
# Open a connection to the filewith open('world_dev_ind.csv') as file: print(counts_dict)

Writing a generator to load data in chunks (2)


# Skip the column names
file.readline()  In the function read_large_file(), read a line from file_object by using
the method readline(). Assign the result to data.
 In the function read_large_file(), yield the line read from the file data. print(next(gen_file))
 In the context manager, create a generator object gen_file by calling
your generator function read_large_file() and passing file to it. print(next(gen_file))
 Print the first three lines produced by the generator print(next(gen_file))
object gen_file using next().
Writing a generator to load data in chunks (3)
# Define read_large_file()def read_large_file(file_object):
"""A generator function to read a large file lazily."""  Bind the file 'world_dev_ind.csv' to file in the context manager
with open().
 Complete the for loop so that it iterates over the generator from the
# Loop indefinitely until the end of the file call to read_large_file() to process all the rows of the file.
while True:
# Initialize an empty dictionary: counts_dict
counts_dict = {}
# Read a line from the file: data
# Open a connection to the filewith open('world_dev_ind.csv') as file:
data = file_object.readline()

# Iterate over the generator from read_large_file()


# Break if this is the end of the file
for line in read_large_file(file):
if not data:
break
row = line.split(',')
first_col = row[0]
# Yield the line of data
yield data
if first_col in counts_dict.keys():
# Open a connection to the filewith open('world_dev_ind.csv') as file:
counts_dict[first_col] += 1
else:
# Create a generator object for the file: gen_file
counts_dict[first_col] = 1
gen_file = read_large_file(file)
# Print
print(counts_dict)
# Print the first three lines of the file
Writing an iterator to load data in chunks (1)
 Use pd.read_csv() to read in 'ind_pop.csv' in chunks of size 10. # Check out specific country: df_pop_ceb
Assign the result to df_reader.
 Print the first two chunks from df_reader. df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
# Zip DataFrame columns of interest: pops
# Import the pandas packageimport pandas as pd pops = zip(df_pop_ceb['Total Population'],
# Initialize reader object: df_reader df_pop_ceb['Urban population (% of total)'])
df_reader = pd.read_csv('ind_pop.csv', chunksize=10) # Turn zip object into list: pops_list
# Print two chunks pops_list = list(pops)
print(next(df_reader)) # Print pops_list
print(next(df_reader)) print(pops_list)

Writing an iterator to load data in chunks (2) Writing an iterator to load data in chunks (3)
 Use pd.read_csv() to read in the file in 'ind_pop_data.csv' in chunks  Write a list comprehension to generate a list of values
of size 1000. Assign the result to urb_pop_reader. from pops_list for the new column 'Total Urban Population'.
 Get the first DataFrame chunk from the iterable urb_pop_reader and The output expression should be the product of the first and second
assign this to df_urb_pop. element in each tuple in pops_list. Because the 2nd element is a
 Select only the rows of df_urb_pop that have percentage, you also need to either multiply the result by 0.01 or
a 'CountryCode' of 'CEB'. To do this, compare divide it by 100. In addition, note that the column 'Total Urban
whether df_urb_pop['CountryCode'] is equal to 'CEB' within the Population' should only be able to take on integer values. To ensure
square brackets in df_urb_pop[____]. this, make sure you cast the output expression to an integer with int().
 Using zip(), zip together the 'Total Population' and 'Urban population  Create a scatter plot where the x-axis are values from
(% of total)' columns of df_pop_ceb. Assign the resulting zip object the 'Year' column and the y-axis are values from the 'Total Urban
to pops. Population' column.

# Initialize reader object: urb_pop_reader # edited/addedimport matplotlib.pyplot as plt


urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) # Code from previous exercise
# Get the first DataFrame chunk: df_urb_pop urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
df_urb_pop = next(urb_pop_reader) df_urb_pop = next(urb_pop_reader)
# Check out the head of the DataFrame df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
print(df_urb_pop.head()) pops = zip(df_pop_ceb['Total Population'],
df_pop_ceb['Urban population (% of total)']) pops = zip(df_pop_ceb['Total Population'],
pops_list = list(pops) df_pop_ceb['Urban population (% of total)'])
# Use list comprehension to create new DataFrame column 'Total Urban
Population' # Turn zip object into list: pops_list
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list = list(pops)
pops_list]
# Plot urban population data
# Use list comprehension to create new DataFrame column 'Total Urban
df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population') Population'
plt.show() df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup
in pops_list]
Writing an iterator to load data in chunks (4)

 Initialize an empty DataFrame data using pd.DataFrame(). # Concatenate DataFrame chunk to the end of data: data
 In the for loop, iterate over urb_pop_reader to be able to process all
data = pd.concat([data, df_pop_ceb])
the DataFrame chunks in the dataset.
 Concatenate data and df_pop_ceb by passing a list of the DataFrames # Plot urban population data
to pd.concat(). data.plot(kind='scatter', x='Year', y='Total Urban Population')
plt.show()
# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) Writing an iterator to load data in chunks (5)
# Initialize empty DataFrame: data
 Define the function plot_pop() that has two arguments: first
data = pd.DataFrame() is filename for the file to process and second is country_code for the
# Iterate over each DataFrame chunkfor df_urb_pop in urb_pop_reader: country to be processed in the dataset.
 Call plot_pop() to process the data for country code 'CEB' in the
file 'ind_pop_data.csv'.
# Check out specific country: df_pop_ceb  Call plot_pop() to process the data for country code 'ARB' in the
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB'] file 'ind_pop_data.csv'.

# Define plot_pop()def plot_pop(filename, country_code):


# Zip DataFrame columns of interest: pops
# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv(filename, chunksize=1000) # Plot urban population data
data.plot(kind='scatter', x='Year', y='Total Urban Population')
# Initialize empty DataFrame: data plt.show()
data = pd.DataFrame() # Set the filename: fn
fn = 'ind_pop_data.csv'
# Iterate over each DataFrame chunk # Call plot_pop for country code 'CEB'
for df_urb_pop in urb_pop_reader: plot_pop(fn, 'CEB')
# Check out specific country: df_pop_ceb # Call plot_pop for country code 'ARB'
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == plot_pop(fn, 'ARB')
country_code]

# Zip DataFrame columns of interest: pops


pops = zip(df_pop_ceb['Total Population'],
df_pop_ceb['Urban population (% of total)'])

# Turn zip object into list: pops_list


pops_list = list(pops)

# Use list comprehension to create new DataFrame column 'Total


Urban Population'
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for
tup in pops_list]

# Concatenate DataFrame chunk to the end of data: data


data = pd.concat([data, df_pop_ceb])

You might also like