Python Data Science Toolbox
Python Data Science Toolbox
You can build list comprehensions over all the objects except the string
The list comprehension is [doc[0] for doc in doctor] and produces the lists doctor and flash.
list ['h', 'c', 'c', 't', 'w'].
You can build list comprehensions over all the objects except range(50).
The list comprehension is [doc[0] in doctor] and produces the list ['h', 'c',
'c', 't', 'w'].
You can build list comprehensions over all the objects except the integer
List comprehension over iterables object valjean.
You know that list comprehensions can be built over iterables. Given the
following objects below, which of these can we build list comprehensions Writing list comprehensions
over?
Using the range of numbers from 0 to 9 as your iterable and i as your
doctor = ['house', 'cuddy', 'chase', 'thirteen', 'wilson'] iterator variable, write a list comprehension that produces a list of
numbers consisting of the squared values of i.
range(50)
# Create list comprehension: squares
squares = [i**2 for i in range(0,10)]
underwood = 'After all, we are nothing more or less than what we choose to
reveal.' Nested list comprehensions
In the inner list comprehension - that is, the output expression of the
jean = '24601'
nested list comprehension - create a list of values
from 0 to 4 using range(). Use col as the iterator variable.
flash = ['jay garrick', 'barry allen', 'wally west', 'bart allen'] In the iterable part of your nested list comprehension, use range() to
count 5 rows - that is, create a list of values from 0 to 4. Use row as
the iterator variable; note that you won’t be needing this variable to new_fellowship = [member if len(member) >= 7 else '' for member in
create values in the list of lists. fellowship]
# Print the new list
# Create a 5 x 5 matrix using a list of lists: matrix
print(new_fellowship)
matrix = [[col for col in range(5)] for row in range(5)]
# Print the matrixfor row in matrix: Dict comprehensions
print(row) Create a dict comprehension where the key is a string in fellowship and the
value is the length of the string. Remember to use the syntax <key> :
Using conditionals in comprehensions (1) <value> in the output expression part of the comprehension to create the
members of the dictionary. Use member as the iterator variable.
Use member as the iterator variable in the list comprehension. For the
conditional, use len() to evaluate the iterator variable. Note that you # Create a list of strings: fellowship
only want strings with 7 characters or more. fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create dict comprehension: new_fellowship
# Create a list of strings: fellowship
new_fellowship = { member:len(member) for member in fellowship }
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Print the new dictionary
# Create list comprehension: new_fellowship
print(new_fellowship)
new_fellowship = [member for member in fellowship if len(member) >= 7]
# Print the new list List comprehensions vs. generators
print(new_fellowship) To help with that task, the following code has been pre-loaded in the
environment:
Using conditionals in comprehensions (2)
# List of strings
In the output expression, keep the string as-is if the number of fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
characters is >= 7, else replace it with an empty string - that is, '' or "".
# List comprehension
# Create a list of strings: fellowship
fellow1 = [member for member in fellowship if len(member) >= 7]
fellowship = ['frodo', 'samwise', 'merry', 'aragorn', 'legolas', 'boromir', 'gimli']
# Create list comprehension: new_fellowship
# Generator expression
fellow2 = (member for member in fellowship if len(member) >= 7) print(next(result))
Try to play around with fellow1 and fellow2 by figuring out their types and print(next(result))
printing out their values. Based on your observations and what you can recall print(next(result))
from the video, select from the options below the best description for the
# Print the rest of the valuesfor value in result:
difference between list comprehensions and generators.
print(value)
List comprehensions and generators are not different at all; they are just Changing the output in generator expressions
different ways of writing the same thing.
Write a generator expression that will generate the lengths of each
string in lannister. Use person as the iterator variable. Assign the
A list comprehension produces a list as output, a generator produces a result to lengths.
generator object. Supply the correct iterable in the for loop for printing the values in the
generator object.
A list comprehension produces a list as output that can be iterated over, a # Create a list of strings: lannister
generator produces a generator object that can’t be iterated over. lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']
Use open() to bind the csv file 'world_dev_ind.csv' as file in the # Else, add to the dict and set value to 1
context manager. else:
Complete the for loop so that it iterates 1000 times to perform the
loop body and process only the first 1000 rows of data of the file. counts_dict[first_col] = 1
# Print the resulting dictionary
# Open a connection to the filewith open('world_dev_ind.csv') as file: print(counts_dict)
Writing an iterator to load data in chunks (2) Writing an iterator to load data in chunks (3)
Use pd.read_csv() to read in the file in 'ind_pop_data.csv' in chunks Write a list comprehension to generate a list of values
of size 1000. Assign the result to urb_pop_reader. from pops_list for the new column 'Total Urban Population'.
Get the first DataFrame chunk from the iterable urb_pop_reader and The output expression should be the product of the first and second
assign this to df_urb_pop. element in each tuple in pops_list. Because the 2nd element is a
Select only the rows of df_urb_pop that have percentage, you also need to either multiply the result by 0.01 or
a 'CountryCode' of 'CEB'. To do this, compare divide it by 100. In addition, note that the column 'Total Urban
whether df_urb_pop['CountryCode'] is equal to 'CEB' within the Population' should only be able to take on integer values. To ensure
square brackets in df_urb_pop[____]. this, make sure you cast the output expression to an integer with int().
Using zip(), zip together the 'Total Population' and 'Urban population Create a scatter plot where the x-axis are values from
(% of total)' columns of df_pop_ceb. Assign the resulting zip object the 'Year' column and the y-axis are values from the 'Total Urban
to pops. Population' column.
Initialize an empty DataFrame data using pd.DataFrame(). # Concatenate DataFrame chunk to the end of data: data
In the for loop, iterate over urb_pop_reader to be able to process all
data = pd.concat([data, df_pop_ceb])
the DataFrame chunks in the dataset.
Concatenate data and df_pop_ceb by passing a list of the DataFrames # Plot urban population data
to pd.concat(). data.plot(kind='scatter', x='Year', y='Total Urban Population')
plt.show()
# Initialize reader object: urb_pop_reader
urb_pop_reader = pd.read_csv('ind_pop_data.csv', chunksize=1000) Writing an iterator to load data in chunks (5)
# Initialize empty DataFrame: data
Define the function plot_pop() that has two arguments: first
data = pd.DataFrame() is filename for the file to process and second is country_code for the
# Iterate over each DataFrame chunkfor df_urb_pop in urb_pop_reader: country to be processed in the dataset.
Call plot_pop() to process the data for country code 'CEB' in the
file 'ind_pop_data.csv'.
# Check out specific country: df_pop_ceb Call plot_pop() to process the data for country code 'ARB' in the
df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB'] file 'ind_pop_data.csv'.