0% found this document useful (0 votes)

44 views14 pages

Python Data Science Toolbox

Uploaded by

Anh Thư Trần Võ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views14 pages

Python Data Science Toolbox

Uploaded by

Anh Thư Trần Võ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

PYTHON DATA SCIENCE TOOLBOX (PART 2) print(next(superhero))

Iterators vs. Iterables print(next(superhero))

The environment has been pre-loaded with the variables flash1 and flash2. print(next(superhero))
Try printing out their values with print() and next() to figure out which is
an iterable and which is an iterator. Iterating over iterables (2)

 Create an iterator object small_value over range(3) using the

Both flash1 and flash2 are iterators. function iter().
 Using a for loop, iterate over range(3), printing the value for every
iteration. Use num as the loop variable.
Both flash1 and flash2 are iterables.  Create an iterator object googol over range(10 ** 100).

# Create an iterator for range(3): small_value

flash1 is an iterable and flash2 is an iterator. small_value = iter(range(3))

Iterating over iterables (1) # Print the values in small_value

print(next(small_value))
 Create a for loop to loop over flash and print the values in the list.
print(next(small_value))
Use person as the loop variable.
 Create an iterator for the list flash and assign the result to superhero. print(next(small_value))
 Print each of the items from superhero using next() 4 times. # Loop over range(3) and print the valuesfor num in range(3):
print(num)
# Create a list of strings: flash
# Create an iterator for range(10 ** 100): googol
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']
googol = iter(range(10 ** 100))
# Print each list item in flash using a for loopfor person in flash:
# Print the first 5 values from googol
print(person)
print(next(googol))
# Create an iterator for flash: superhero
print(next(googol))
superhero = iter(flash)
print(next(googol))
# Print each item from the iterator
print(next(googol))
print(next(superhero))
print(next(googol))  Complete the first for loop by unpacking the tuples generated by
calling enumerate() on mutants. Use index1 for the index
and value1 for the value when unpacking the tuple.
Iterators as function arguments  Complete the second for loop similarly as with the first, but this time
change the starting index to start from 1 by passing it in as an
 Create a range object that would produce the values from 10 to 20 argument to the start parameter of enumerate(). Use index2 for the
using range(). Assign the result to values. index and value2 for the value when unpacking the tuple.
 Use the list() function to create a list of values from the range
object values. Assign the result to values_list.
 Use the sum() function to get the sum of the values from 10 to 20 # Create a list of strings: mutants
from the range object values. Assign the result to values_sum. mutants = ['charles xavier',
'bobby drake',
# Create a range object: values
'kurt wagner',
values = range(10, 21)
'max eisenhardt',
# Print the range object
'kitty pryde']
print(values)
# Create a list of tuples: mutant_list
# Create a list of integers: values_list
mutant_list = list(enumerate(mutants))
values_list = list(values)
# Print the list of tuples
# Print values_list
print(mutant_list)
print(values_list)
# Unpack and print the tuple pairsfor index1, value1 in enumerate(mutants):
# Get the sum of values: values_sum
print(index1, value1)
values_sum = sum(values)
# Change the start indexfor index2, value2 in enumerate(mutants, start=1):
# Print values_sum
print(index2, value2)
print(values_sum)
Using zip
Using enumerate
 Using zip() with list(), create a list of tuples from the three
 Create a list of tuples from mutants and assign the result lists mutants, aliases, and powers (in that order) and assign the result
to mutant_list. Make sure you generate the tuples to mutant_data.
using enumerate() and turn the result from it into a list using list().  Using zip(), create a zip object called mutant_zip from the three
lists mutants, aliases, and powers.
 Complete the for loop by unpacking the zip object you created and  The last print() statements prints the output of
printing the tuple values. Use value1, value2, value3 for the values comparing result1 to mutants and result2 to powers. Click Submit
from each of mutants, aliases, and powers, in that order. Answer to see if the unpacked result1 and result2 are equivalent
to mutants and powers, respectively.
# edited/added
# Create a zip object from mutants and powers: z1
aliases = ['prof x', 'iceman', 'nightcrawler', 'magneto', 'shadowcat']
z1 = zip(mutants, powers)
powers = ['telepathy', 'thermokinesis', 'teleportation', 'magnetokinesis',
'intangibility'] # Print the tuples in z1 by unpacking with *
# Create a list of tuples: mutant_data print(*z1)
mutant_data = list(zip(mutants, aliases, powers)) # Re-create a zip object from mutants and powers: z1
# Print the list of tuples z1 = zip(mutants, powers)
print(mutant_data) # 'Unzip' the tuples in z1 by unpacking with * and zip(): result1, result2
# Create a zip object using the three lists: mutant_zip result1, result2 = zip(*z1)
mutant_zip = zip(mutants, aliases, powers) # Check if unpacked tuples are equivalent to original tuples
# Print the zip object print(result1 == mutants)
print(mutant_zip) print(result2 == powers)
# Unpack the zip object and print the tuple valuesfor value1, value2, value3
in mutant_zip: Processing large amounts of Twitter data
print(value1, value2, value3)  Initialize an empty dictionary counts_dict for storing the results of
processing the Twitter data.
Using * and zip to ‘unzip’  Iterate over the 'tweets.csv' file by using a for loop. Use the loop
variable chunk and iterate over the call to pd.read_csv() with
 Create a zip object by using zip() on mutants and powers, in that a chunksize of 10.
order. Assign the result to z1.  In the inner loop, iterate over the column 'lang' in chunk by using
 Print the tuples in z1 by unpacking them into positional arguments a for loop. Use the loop variable entry.
using the * operator in a print() call.
 Because the previous print() call would have exhausted the elements
# edited/addedimport pandas as pd
in z1, recreate the zip object you defined earlier and assign the result
again to z1. # Initialize an empty dictionary: counts_dict
 ‘Unzip’ the tuples in z1 by unpacking them into positional arguments counts_dict = {}
using the * operator in a zip() call. Assign the results
to result1 and result2, in that order.
# Iterate over the file chunk by chunkfor chunk in pd.read_csv('tweets.csv',
chunksize=10): # Initialize an empty dictionary: counts_dict
counts_dict = {}
# Iterate over the column in DataFrame
for entry in chunk['lang']: # Iterate over the file chunk by chunk
if entry in counts_dict.keys(): for chunk in pd.read_csv(csv_file, chunksize=c_size):
counts_dict[entry] += 1
else: # Iterate over the column in DataFrame
counts_dict[entry] = 1 for entry in chunk[colname]:
# Print the populated dictionary if entry in counts_dict.keys():
print(counts_dict) counts_dict[entry] += 1

Extracting information for large amounts of Twitter data else:

counts_dict[entry] = 1
 Define the function count_entries(), which has 3 parameters. The first
parameter is csv_file for the filename, the second is c_size for the
chunk size, and the last is colname for the column name. # Return counts_dict
 Iterate over the file in csv_file file by using a for loop. Use the loop return counts_dict
variable chunk and iterate over the call to pd.read_csv(),
passing c_size to chunksize. # Call count_entries(): result_counts
 In the inner loop, iterate over the column given result_counts = count_entries('tweets.csv', 10, 'lang')
by colname in chunk by using a for loop. Use the loop variable entry.
 Call the count_entries() function by passing to it the # Print result_counts
filename 'tweets.csv', the size of chunks 10, and the name of the print(result_counts)
column to count, 'lang'. Assign the result of the call to the
variable result_counts. Write a basic list comprehension
The following list has been pre-loaded in the environment.
# Define count_entries()def count_entries(csv_file, c_size, colname):
"""Return a dictionary with counts of doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson']
occurrences as value for each key."""
How would a list comprehension that produces a list of the first character of valjean = 24601
each string in doctor look like? Note that the list comprehension uses doc as
the iterator variable. What will the output be?
You can build list comprehensions over all the objects except the string of
The list comprehension is [for doc in doctor: doc[0]] and produces the number characters jean.
list ['h', 'c', 'c', 't', 'w'].

You can build list comprehensions over all the objects except the string
The list comprehension is [doc[0] for doc in doctor] and produces the lists doctor and flash.
list ['h', 'c', 'c', 't', 'w'].

You can build list comprehensions over all the objects except range(50).
The list comprehension is [doc[0] in doctor] and produces the list ['h', 'c',
'c', 't', 'w'].
You can build list comprehensions over all the objects except the integer
List comprehension over iterables object valjean.
You know that list comprehensions can be built over iterables. Given the
following objects below, which of these can we build list comprehensions Writing list comprehensions
over?
 Using the range of numbers from 0 to 9 as your iterable and i as your
doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson'] iterator variable, write a list comprehension that produces a list of
numbers consisting of the squared values of i.

range(50)
# Create list comprehension: squares
squares = [i**2 for i in range(0,10)]
underwood = 'After all, we are nothing more or less than what we choose to
reveal.' Nested list comprehensions

 In the inner list comprehension - that is, the output expression of the
jean = '24601'
nested list comprehension - create a list of values
from 0 to 4 using range(). Use col as the iterator variable.
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen']  In the iterable part of your nested list comprehension, use range() to
count 5 rows - that is, create a list of values from 0 to 4. Use row as
the iterator variable; note that you won’t be needing this variable to new_fellowship = [member if len(member) >= 7 else '' for member in
create values in the list of lists. fellowship]
# Print the new list
# Create a 5 x 5 matrix using a list of lists: matrix
print(new_fellowship)
matrix = [[col for col in range(5)] for row in range(5)]
# Print the matrixfor row in matrix: Dict comprehensions
print(row) Create a dict comprehension where the key is a string in fellowship and the
value is the length of the string. Remember to use the syntax <key> :
Using conditionals in comprehensions (1) <value> in the output expression part of the comprehension to create the
members of the dictionary. Use member as the iterator variable.
 Use member as the iterator variable in the list comprehension. For the
conditional, use len() to evaluate the iterator variable. Note that you # Create a list of strings: fellowship
only want strings with 7 characters or more. fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create dict comprehension: new_fellowship
# Create a list of strings: fellowship
new_fellowship = { member:len(member) for member in fellowship }
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Print the new dictionary
# Create list comprehension: new_fellowship
print(new_fellowship)
new_fellowship = [member for member in fellowship if len(member) >= 7]
# Print the new list List comprehensions vs. generators
print(new_fellowship) To help with that task, the following code has been pre-loaded in the
environment:
Using conditionals in comprehensions (2)
# List of strings
 In the output expression, keep the string as-is if the number of fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
characters is >= 7, else replace it with an empty string - that is, '' or "".

# List comprehension
# Create a list of strings: fellowship
fellow1 = [member for member in fellowship if len(member) >= 7]
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create list comprehension: new_fellowship
# Generator expression
fellow2 = (member for member in fellowship if len(member) >= 7) print(next(result))

Try to play around with fellow1 and fellow2 by figuring out their types and print(next(result))
printing out their values. Based on your observations and what you can recall print(next(result))
from the video, select from the options below the best description for the
# Print the rest of the valuesfor value in result:
difference between list comprehensions and generators.
print(value)

List comprehensions and generators are not different at all; they are just Changing the output in generator expressions
different ways of writing the same thing.
 Write a generator expression that will generate the lengths of each
string in lannister. Use person as the iterator variable. Assign the
A list comprehension produces a list as output, a generator produces a result to lengths.
generator object.  Supply the correct iterable in the for loop for printing the values in the
generator object.

A list comprehension produces a list as output that can be iterated over, a # Create a list of strings: lannister
generator produces a generator object that can’t be iterated over. lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']

Write your own generator expressions # Create a generator object: lengths

lengths = (len(person) for person in lannister)
 Create a generator object that will produce values from 0 to 30.
# Iterate over and print the values in lengthsfor value in lengths:
Assign the result to result and use num as the iterator variable in the
generator expression. print(value)
 Print the first 5 values by using next() appropriately in print().
 Print the rest of the values by using a for loop to iterate over the Build a generator
generator object.
 Complete the function header for the function get_lengths() that has a
# Create generator object: result single parameter, input_list.
 In the for loop in the function definition, yield the length of the strings
result = (num for num in range(31)) in input_list.
# Print the first 5 values  Complete the iterable part of the for loop for printing the values
generated by the get_lengths() generator function. Supply the call
print(next(result))
to get_lengths(), passing in the list lannister.
print(next(result))
# Create a list of strings tweet_clock_time = [entry[11:19] for entry in tweet_time]
lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey'] # Print the extracted times
# Define generator function get_lengthsdef get_lengths(input_list): print(tweet_clock_time)
"""Generator function that yields the
Conditional list comprehensions for time-stamped data
length of the strings in input_list."""
 Extract the column 'created_at' from df and assign the result
to tweet_time.
# Yield the length of a string
 Create a list comprehension that extracts the time from each row
for person in input_list: in tweet_time. Each row is a string that represents a timestamp, and
yield len(person) you will access the 12th to 19th characters in the string to extract the
time. Use entry as the iterator variable and assign the result
# Print the values generated by get_lengths()for value in to tweet_clock_time. Additionally, add a conditional expression that
get_lengths(lannister): checks whether entry[17:19] is equal to '19'.
print(value)
# Extract the created_at column from df: tweet_time
List comprehensions for time-stamped data tweet_time = df['created_at']
 Extract the column 'created_at' from df and assign the result # Extract the clock time: tweet_clock_time
to tweet_time. Fun fact: the extracted column in tweet_time here is a tweet_clock_time = [entry[11:19] for entry in tweet_time if entry[17:19] ==
Series data structure! '19']
 Create a list comprehension that extracts the time from each row
in tweet_time. Each row is a string that represents a timestamp, and # Print the extracted times
you will access the 12th to 19th characters in the string to extract the print(tweet_clock_time)
time. Use entry as the iterator variable and assign the result
to tweet_clock_time. Remember that Python uses 0-based indexing! Dictionaries for data science

# edited/added  Create a zip object by calling zip() and passing to

it feature_names and row_vals. Assign the result to zipped_lists.
df = pd.read_csv('tweets.csv')
 Create a dictionary from the zipped_lists zip object by
# Extract the created_at column from df: tweet_time calling dict() with zipped_lists. Assign the resulting dictionary
tweet_time = df['created_at'] to rs_dict.

# Extract the clock time: tweet_clock_time

# edited/added
feature_names = ['CountryName', 'CountryCode', 'IndicatorName',
'IndicatorCode', 'Year', 'Value'] # Return the dictionary
row_vals = ['Arab World', 'ARB', 'Adolescent fertility rate (births per 1,000 return rs_dict
women ages 15-19)', 'SP.ADO.TFRT', '1960', '133.56090740552298']
# Call lists2dict: rs_fxn
# Zip lists: zipped_lists
rs_fxn = lists2dict(feature_names, row_vals)
zipped_lists = zip(feature_names, row_vals)
# Print rs_fxn
# Create a dictionary: rs_dict
print(rs_fxn)
rs_dict = dict(zipped_lists)
# Print the dictionary Using a list comprehension
print(rs_dict)
 Inspect the contents of row_lists by printing the first two lists
Writing a function to help you in row_lists.
 Create a list comprehension that generates a dictionary
using lists2dict() for each sublist in row_lists. The keys are from
 Define the function lists2dict() with two parameters: first is list1 and
the feature_names list and the values are the row entries in row_lists.
second is list2.
Use sublist as your iterator variable and assign the resulting list of
 Return the resulting dictionary rs_dict in lists2dict().
dictionaries to list_of_dicts.
 Call the lists2dict() function with the
 Look at the first two dictionaries in list_of_dicts by printing them out.
arguments feature_names and row_vals. Assign the result of the
function call to rs_fxn.
# edited/addedimport csvwith open('row_lists.csv', 'r', newline='') as csvfile:
# Define lists2dict()def lists2dict(list1, list2): reader = csv.reader(csvfile)
"""Return a dictionary where list1 provides row_lists = [row for row in reader]
the keys and list2 provides the values.""" # Print the first two lists in row_lists
print(row_lists[0])
# Zip lists: zipped_lists print(row_lists[1])
zipped_lists = zip(list1, list2) # Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists]
# Create a dictionary: rs_dict # Print the first two dictionaries in list_of_dicts
rs_dict = dict(zipped_lists) print(list_of_dicts[0])
print(list_of_dicts[1])
# Initialize an empty dictionary: counts_dict
Turning this all into a DataFrame
counts_dict = {}
 To use the DataFrame() function you need, first import the pandas
package with the alias pd.
# Process only the first 1000 rows
 Create a DataFrame from the list of dictionaries in list_of_dicts by
calling pd.DataFrame(). Assign the resulting DataFrame to df. for j in range(0, 1000):
 Inspect the contents of df printing the head of the DataFrame. Head of
the DataFrame df can be accessed by calling df.head().
# Split the current line into a list: line
# Import the pandas packageimport pandas as pd line = file.readline().split(',')
# Turn list of lists into list of dicts: list_of_dicts
list_of_dicts = [lists2dict(feature_names, sublist) for sublist in row_lists] # Get the value for the first column: first_col
# Turn list of dicts into a DataFrame: df first_col = line[0]
df = pd.DataFrame(list_of_dicts)
# Print the head of the DataFrame # If the column value is in the dict, increment its value
print(df.head()) if first_col in counts_dict.keys():
counts_dict[first_col] += 1
Processing data in chunks (1)

 Use open() to bind the csv file 'world_dev_ind.csv' as file in the # Else, add to the dict and set value to 1
context manager. else:
 Complete the for loop so that it iterates 1000 times to perform the
loop body and process only the first 1000 rows of data of the file. counts_dict[first_col] = 1
# Print the resulting dictionary
# Open a connection to the filewith open('world_dev_ind.csv') as file: print(counts_dict)

Writing a generator to load data in chunks (2)

# Skip the column names
file.readline()  In the function read_large_file(), read a line from file_object by using
the method readline(). Assign the result to data.
 In the function read_large_file(), yield the line read from the file data. print(next(gen_file))
 In the context manager, create a generator object gen_file by calling
your generator function read_large_file() and passing file to it. print(next(gen_file))
 Print the first three lines produced by the generator print(next(gen_file))
object gen_file using next().
Writing a generator to load data in chunks (3)
# Define read_large_file()def read_large_file(file_object):
"""A generator function to read a large file lazily."""  Bind the file 'world_dev_ind.csv' to file in the context manager
with open().
 Complete the for loop so that it iterates over the generator from the
# Loop indefinitely until the end of the file call to read_large_file() to process all the rows of the file.
while True:
# Initialize an empty dictionary: counts_dict
counts_dict = {}
# Read a line from the file: data
# Open a connection to the filewith open('world_dev_ind.csv') as file:
data = file_object.readline()

# Iterate over the generator from read_large_file()

# Break if this is the end of the file
for line in read_large_file(file):
if not data:
break
row = line.split(',')
first_col = row[0]
# Yield the line of data
yield data
if first_col in counts_dict.keys():
# Open a connection to the filewith open('world_dev_ind.csv') as file:
counts_dict[first_col] += 1
else:
# Create a generator object for the file: gen_file
counts_dict[first_col] = 1
gen_file = read_large_file(file)
# Print
print(counts_dict)
# Print the first three lines of the file
Writing an iterator to load data in chunks (1)
 Use pd.read_csv() to read in 'ind_pop.csv' in chunks of size 10. # Check out specific country: df_pop_ceb
Assign the result to df_reader.
 Print the first two chunks from df_reader. df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
# Zip DataFrame columns of interest: pops
# Import the pandas packageimport pandas as pd pops = zip(df_pop_ceb['Total Population'],
# Initialize reader object: df_reader df_pop_ceb['Urban population (% of total)'])
df_reader = pd.read_csv('ind_pop.csv', chunksize=10) # Turn zip object into list: pops_list
# Print two chunks pops_list = list(pops)
print(next(df_reader)) # Print pops_list
print(next(df_reader)) print(pops_list)

Writing an iterator to load data in chunks (2) Writing an iterator to load data in chunks (3)
 Use pd.read_csv() to read in the file in 'ind_pop_data.csv' in chunks  Write a list comprehension to generate a list of values
of size 1000. Assign the result to urb_pop_reader. from pops_list for the new column 'Total Urban Population'.
 Get the first DataFrame chunk from the iterable urb_pop_reader and The output expression should be the product of the first and second
assign this to df_urb_pop. element in each tuple in pops_list. Because the 2nd element is a
 Select only the rows of df_urb_pop that have percentage, you also need to either multiply the result by 0.01 or
a 'CountryCode' of 'CEB'. To do this, compare divide it by 100. In addition, note that the column 'Total Urban
whether df_urb_pop['CountryCode'] is equal to 'CEB' within the Population' should only be able to take on integer values. To ensure
square brackets in df_urb_pop[____]. this, make sure you cast the output expression to an integer with int().
 Using zip(), zip together the 'Total Population' and 'Urban population  Create a scatter plot where the x-axis are values from
(% of total)' columns of df_pop_ceb. Assign the resulting zip object the 'Year' column and the y-axis are values from the 'Total Urban
to pops. Population' column.

# Initialize reader object: urb_pop_reader # edited/addedimport matplotlib.pyplot as plt

urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) # Code from previous exercise
# Get the first DataFrame chunk: df_urb_pop urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000)
df_urb_pop = next(urb_pop_reader) df_urb_pop = next(urb_pop_reader)
# Check out the head of the DataFrame df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']
print(df_urb_pop.head()) pops = zip(df_pop_ceb['Total Population'],
df_pop_ceb['Urban population (% of total)']) pops = zip(df_pop_ceb['Total Population'],
pops_list = list(pops) df_pop_ceb['Urban population (% of total)'])
# Use list comprehension to create new DataFrame column 'Total Urban
Population' # Turn zip object into list: pops_list
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup in pops_list = list(pops)
pops_list]
# Plot urban population data
# Use list comprehension to create new DataFrame column 'Total Urban
df_pop_ceb.plot(kind='scatter', x='Year', y='Total Urban Population') Population'
plt.show() df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for tup
in pops_list]
Writing an iterator to load data in chunks (4)

 Initialize an empty DataFrame data using pd.DataFrame(). # Concatenate DataFrame chunk to the end of data: data
 In the for loop, iterate over urb_pop_reader to be able to process all
data = pd.concat([data, df_pop_ceb])
the DataFrame chunks in the dataset.
 Concatenate data and df_pop_ceb by passing a list of the DataFrames # Plot urban population data
to pd.concat(). data.plot(kind='scatter', x='Year', y='Total Urban Population')
plt.show()
# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) Writing an iterator to load data in chunks (5)
# Initialize empty DataFrame: data
 Define the function plot_pop() that has two arguments: first
data = pd.DataFrame() is filename for the file to process and second is country_code for the
# Iterate over each DataFrame chunkfor df_urb_pop in urb_pop_reader: country to be processed in the dataset.
 Call plot_pop() to process the data for country code 'CEB' in the
file 'ind_pop_data.csv'.
# Check out specific country: df_pop_ceb  Call plot_pop() to process the data for country code 'ARB' in the
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB'] file 'ind_pop_data.csv'.

# Define plot_pop()def plot_pop(filename, country_code):

# Zip DataFrame columns of interest: pops
# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv(filename, chunksize=1000) # Plot urban population data
data.plot(kind='scatter', x='Year', y='Total Urban Population')
# Initialize empty DataFrame: data plt.show()
data = pd.DataFrame() # Set the filename: fn
fn = 'ind_pop_data.csv'
# Iterate over each DataFrame chunk # Call plot_pop for country code 'CEB'
for df_urb_pop in urb_pop_reader: plot_pop(fn, 'CEB')
# Check out specific country: df_pop_ceb # Call plot_pop for country code 'ARB'
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == plot_pop(fn, 'ARB')
country_code]

# Zip DataFrame columns of interest: pops

pops = zip(df_pop_ceb['Total Population'],
df_pop_ceb['Urban population (% of total)'])

# Turn zip object into list: pops_list

pops_list = list(pops)

# Use list comprehension to create new DataFrame column 'Total

Urban Population'
df_pop_ceb['Total Urban Population'] = [int(tup[0] * tup[1] * 0.01) for
tup in pops_list]

# Concatenate DataFrame chunk to the end of data: data

data = pd.concat([data, df_pop_ceb])

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Python Data Science Toolbox
No ratings yet
Python Data Science Toolbox
14 pages
Python Lab Programs
No ratings yet
Python Lab Programs
58 pages
Chapter1 PDF
No ratings yet
Chapter1 PDF
25 pages
Py Toolbox 4 Iterators
No ratings yet
Py Toolbox 4 Iterators
25 pages
Nss 1
No ratings yet
Nss 1
2 pages
Python Lab Manual
No ratings yet
Python Lab Manual
17 pages
Rufh 4
No ratings yet
Rufh 4
24 pages
Unit 1 - Lab Programs
No ratings yet
Unit 1 - Lab Programs
12 pages
Python Cheatsheet 2
No ratings yet
Python Cheatsheet 2
4 pages
Python For Data Cheetsheet
No ratings yet
Python For Data Cheetsheet
13 pages
Cycle 1 Programs
No ratings yet
Cycle 1 Programs
20 pages
Python
No ratings yet
Python
18 pages
PDS Praticals List
No ratings yet
PDS Praticals List
7 pages
Python (3) Leaflet: Roland Becker December 16, 2020
No ratings yet
Python (3) Leaflet: Roland Becker December 16, 2020
15 pages
Data Science Python
No ratings yet
Data Science Python
21 pages
What Will Be The Output For The Following Code
No ratings yet
What Will Be The Output For The Following Code
19 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Juice Pilaadooooo Chu Chu Ka
No ratings yet
Juice Pilaadooooo Chu Chu Ka
18 pages
6 To 10
No ratings yet
6 To 10
10 pages
Python Language Features Summary
No ratings yet
Python Language Features Summary
26 pages
Python PPT UNIT-2
No ratings yet
Python PPT UNIT-2
27 pages
Rufh 2
No ratings yet
Rufh 2
28 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
OSDBMS
No ratings yet
OSDBMS
59 pages
Python - Week 1 PDF
No ratings yet
Python - Week 1 PDF
28 pages
Week 1: 1 The Python Programming Language: Functions
No ratings yet
Week 1: 1 The Python Programming Language: Functions
9 pages
Python Control - Flow Iterations Functions
No ratings yet
Python Control - Flow Iterations Functions
21 pages
FODS Using Python Practical File
No ratings yet
FODS Using Python Practical File
18 pages
Sem (Iv) (Aktu) Theory Examination 2023-24 Solution
No ratings yet
Sem (Iv) (Aktu) Theory Examination 2023-24 Solution
9 pages
Python Pyq Solution Aktu
No ratings yet
Python Pyq Solution Aktu
42 pages
Lec 2
No ratings yet
Lec 2
58 pages
Lab #2 - Data Analysis With NumPy and Pandas
No ratings yet
Lab #2 - Data Analysis With NumPy and Pandas
7 pages
Lab Questions IDSE 2024
No ratings yet
Lab Questions IDSE 2024
7 pages
C:/Users/Rafe/Appdata/Local/Programs/Python/Python35-32/Scripts Object and Data Structures Basics
No ratings yet
C:/Users/Rafe/Appdata/Local/Programs/Python/Python35-32/Scripts Object and Data Structures Basics
16 pages
Lecture 3 Python Multivalue Data Types Functions - Hadi - Updated
No ratings yet
Lecture 3 Python Multivalue Data Types Functions - Hadi - Updated
52 pages
Lab 2-Part 2: Lists
No ratings yet
Lab 2-Part 2: Lists
5 pages
Computer Science Programs
No ratings yet
Computer Science Programs
13 pages
6-10 Python Lab Program
No ratings yet
6-10 Python Lab Program
16 pages
Python 2 Lab Esy
No ratings yet
Python 2 Lab Esy
34 pages
Annual CS 23 24
No ratings yet
Annual CS 23 24
7 pages
Python Mega Assignment # 1
No ratings yet
Python Mega Assignment # 1
10 pages
PRINCIPLES OF DATA SCIENCE Lab
No ratings yet
PRINCIPLES OF DATA SCIENCE Lab
20 pages
De Interview Raamashaamy Qna Bank
No ratings yet
De Interview Raamashaamy Qna Bank
11 pages
Python Exam Paper Solved1
No ratings yet
Python Exam Paper Solved1
6 pages
Python in 90 Minutes
No ratings yet
Python in 90 Minutes
53 pages
Python Basics
No ratings yet
Python Basics
49 pages
UNIT 2 PDS Notes P1
No ratings yet
UNIT 2 PDS Notes P1
20 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
39 pages
Python Lab Manual Created
No ratings yet
Python Lab Manual Created
13 pages
Constitution
No ratings yet
Constitution
3 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Python Manual
No ratings yet
Python Manual
22 pages
Solutions
No ratings yet
Solutions
11 pages
Manual
No ratings yet
Manual
21 pages
Khwaja Moinuddin Chishti Language University
No ratings yet
Khwaja Moinuddin Chishti Language University
30 pages
2023 Itt205
No ratings yet
2023 Itt205
10 pages
Pythoncheatsheet: Dunder Methods
No ratings yet
Pythoncheatsheet: Dunder Methods
14 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction To Python
No ratings yet
Introduction To Python
13 pages
Numpy
No ratings yet
Numpy
9 pages
Pandas
No ratings yet
Pandas
9 pages
Python Data Science Toolbox
No ratings yet
Python Data Science Toolbox
16 pages
Numpy
No ratings yet
Numpy
9 pages
Chap 06
No ratings yet
Chap 06
52 pages
Pandas
No ratings yet
Pandas
9 pages
Python Data Science Toolbox
No ratings yet
Python Data Science Toolbox
17 pages
Introduction To Python
No ratings yet
Introduction To Python
14 pages
NYL - Midterm Review
No ratings yet
NYL - Midterm Review
62 pages
Chap 04
100% (1)
Chap 04
72 pages
Analysis 1 Midterm (Update)
No ratings yet
Analysis 1 Midterm (Update)
11 pages
Chap 02
No ratings yet
Chap 02
51 pages
Chap 01
No ratings yet
Chap 01
37 pages
10 2015 Anal1 ch2 Par2
No ratings yet
10 2015 Anal1 ch2 Par2
94 pages
Pdf24 Merged
No ratings yet
Pdf24 Merged
132 pages
Lecture 12 - Chapter 17 - Oligopoly
No ratings yet
Lecture 12 - Chapter 17 - Oligopoly
35 pages
Maths
No ratings yet
Maths
11 pages
Spaulding Lighting Seattle I-II-III Spec Sheet 9-87
No ratings yet
Spaulding Lighting Seattle I-II-III Spec Sheet 9-87
2 pages
Tugas 2 KPM
No ratings yet
Tugas 2 KPM
2 pages
SUMMER INTERNSHIP REPORT (AutoRecovered)
No ratings yet
SUMMER INTERNSHIP REPORT (AutoRecovered)
19 pages
cpphtp10 07
No ratings yet
cpphtp10 07
111 pages
Haccp Complete
No ratings yet
Haccp Complete
16 pages
Factors That Influence The Distribution of Plants and Animals
No ratings yet
Factors That Influence The Distribution of Plants and Animals
17 pages
电动力学课件：Course1 Vector Analysis
No ratings yet
电动力学课件：Course1 Vector Analysis
29 pages
Solutions 1
No ratings yet
Solutions 1
11 pages
Activity Design District Municipal Festival of Talents 2024
No ratings yet
Activity Design District Municipal Festival of Talents 2024
6 pages
IPB New PGS Proposal Form
No ratings yet
IPB New PGS Proposal Form
3 pages
The Studying Mastermind Guide
No ratings yet
The Studying Mastermind Guide
35 pages
ON A/C 103-103: Reference Qty Designation
No ratings yet
ON A/C 103-103: Reference Qty Designation
21 pages
PEEK-OPTIMA Processing Guide Secured
No ratings yet
PEEK-OPTIMA Processing Guide Secured
0 pages
11plus Y3 English Comprehension Poetry Test & Answers
No ratings yet
11plus Y3 English Comprehension Poetry Test & Answers
9 pages
Voltammetry and Polarography
No ratings yet
Voltammetry and Polarography
46 pages
Year 9 Assessment Support Sample Unit 9aa
No ratings yet
Year 9 Assessment Support Sample Unit 9aa
14 pages
Brown Hauenstein 2005 Interrater Agreement Reconsidered An Alternative To The RWG Indices
No ratings yet
Brown Hauenstein 2005 Interrater Agreement Reconsidered An Alternative To The RWG Indices
20 pages
QKD QKTD
No ratings yet
QKD QKTD
6 pages
Potential AI Based Software Products That Can Be Built by Small Companies
No ratings yet
Potential AI Based Software Products That Can Be Built by Small Companies
2 pages
A Review of Satellite Based Atomic Oxygen Sens - 2023 - Progress in Aerospace SC
No ratings yet
A Review of Satellite Based Atomic Oxygen Sens - 2023 - Progress in Aerospace SC
10 pages
Wins Narrative 2022
100% (1)
Wins Narrative 2022
4 pages
Menstural Cycle DISORDERS
No ratings yet
Menstural Cycle DISORDERS
28 pages
Speeding Up
No ratings yet
Speeding Up
13 pages
Dynamics and Control of Cranes A Review
No ratings yet
Dynamics and Control of Cranes A Review
54 pages
Course Outcome - BCA - BU - Sep - 2023 - Update
No ratings yet
Course Outcome - BCA - BU - Sep - 2023 - Update
24 pages
A Socio Legal Analysis of Voyeurism and Stalking in India
No ratings yet
A Socio Legal Analysis of Voyeurism and Stalking in India
23 pages
AKI 2 Primary Care Bundle Oxfordshire V 1.1
No ratings yet
AKI 2 Primary Care Bundle Oxfordshire V 1.1
1 page
Psychology Chapter 7: Human Memory
No ratings yet
Psychology Chapter 7: Human Memory
26 pages
Li-Fi Technology Seminar
No ratings yet
Li-Fi Technology Seminar
21 pages

Python Data Science Toolbox

Uploaded by

Python Data Science Toolbox

Uploaded by

PYTHON DATA SCIENCE TOOLBOX (PART 2) print(next(superhero))

Iterators vs. Iterables print(next(superhero))

 Create an iterator object small_value over range(3) using the

# Create an iterator for range(3): small_value

Iterating over iterables (1) # Print the values in small_value

Extracting information for large amounts of Twitter data else:

Write your own generator expressions # Create a generator object: lengths

# edited/added  Create a zip object by calling zip() and passing to

# Extract the clock time: tweet_clock_time

Writing a generator to load data in chunks (2)

# Iterate over the generator from read_large_file()

# Initialize reader object: urb_pop_reader # edited/addedimport matplotlib.pyplot as plt

# Define plot_pop()def plot_pop(filename, country_code):

# Zip DataFrame columns of interest: pops

# Turn zip object into list: pops_list

# Use list comprehension to create new DataFrame column 'Total

# Concatenate DataFrame chunk to the end of data: data

You might also like