Pandas: Reference Sheet
Pandas: Reference Sheet
+
a Texas Austin 28700000 x Washington Mount Rainier
State Capital Population Highest Point State Capital Population Highest Point
0 Texas Austin 28700000 NaN 0 New York Albany 19540000 Mount Marcy
1 New York Albany 19540000 Mount Marcy 1 Washington Olympia 7536000 Mount Rainier
0 New York Albany 19540000 Mount Marcy 1 New York Albany 19540000 Mount Marcy
1 Washington Olympia 7536000 Mount Rainier 2 Washington Olympia 7536000 Mount Rainier
2 Nebraska NaN NaN Panorama Point 3 Nebraska NaN NaN Panorama Point
how=‘right’ how=‘outer’
Register or learn more about other courses in our data curriculum by visiting pragmaticinstitute.com/data-science or calling 480.515.1411.
Reference Sheet
The data Splitting into training data and test data
Your data needs to be contained in a two-dimensional feature matrix and, in the case of from sklearn.model_selection import train_test_split
supervised learning, a one-dimensional label vector. The data has to be numeric (NumPy X_train, X_test, y_train, y_test = train_test_split(X, y)
array, SciPy sparse matrix, pandas DataFrame).
Register or learn more about other courses in our data curriculum by visiting pragmaticinstitute.com/data-science or calling 480.515.1411.
Python Syntax REFERENCE SHEET
POWERED BY THE SCIENTISTS AT THE DATA INCUBATOR
SYNTAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Creating variables Functions
Variables can be created by: Functions are a great way to group related lines of code
deg_C = 10.5 # This is a variable into a single unit that can be called upon. Here, we define
a function with two positional arguments a and b and one
A variable name can consist of letters, numbers and the keyword argument multiplier with a default value of
underscore character (_) but the variable name may not 1.
start with a number. Comments are created with a # and
are ignored by the Python interpreter. def subtract(a, b, multiplier=1):
“””
Subtract two numbers and scale the result.
”””
Common mathematical operations diff = multiplier * (a - b)
2 + 3 - addition
return diff
1 - 4 - subtraction
2 * 3 - multiplication
Now, we call the function.
4 / 3 - division
In [1]: subtract(1, 2, multiplier=2)
4 // 3 - floor division (round down)
Out [1]: -2
2 ** 3 - raise to the power
a += 1 - compute a + 1 and assign the result to a
a -= 1 - compute a - 1 and assign the result to a
Boolean logic
Common built-in functions These operations will return either True or
False,depending on the value of the two variables. They
print(temp_data) - print/display the value of temp_
are often used in conjunction with if/elif statements.
data
len(temp_data) - returns the number of values of the a < b - is a less than b
iterable a > b - is a greater than b
sum(temp_data) - returns sum of the values of the a <= b - is a less than or equal to to b
iterable a >= b - is a greater than or equal to b
min(temp_data) - returns the minimum value of the a == b - do a and b have the same value
iterable a != b - do a and b not have the same value
max(temp_data) - returns the maximum value of the a is b - is a the same object as b
iterable
sorted(temp_data) - returns a list of the sorted
values of temp_data
range(start, end, step) - returns an iterable from
start to end (exclusive) using a step size of step
(defaults to 1)
© 2020 Pragmatic Institute, LLC
Loops if/elif/else blocks
Loops are a way to repeatedly execute a block of code. if/elif/else blocks let us control the behavior of
There are two types of loops: for and while loops. For our program based on conditions. For example, what
loops are used to loop through every value of an iterable, value to assign to a variable based on the value of
like a list or tuple. While loops are used to continually another variable. At a minimum, you need one condition
execute a block of code while a provided condition is still to test, using if. Multiple conditions can be tested
true. using multiple elif statements. The code in the else
block, which is optional, is run when none of the tested
for temp in temp_data: conditions are met.
print(temp)
if amount < 5:
count = 0 rate = 0.1
while count < 10: elif amount <= 5 and amount < 10:
print(count) rate = 0.2
count += 1 else:
rate = 0.25
DATA STRUCTURES . . . . . . . . . . . . . . . . . . . . . . . .
Strings Lists
Strings are a sequence of characters and are great when Lists are an ordered collection of Python objects. The
wanting to represent text. They’re created using either items of the lists do not have to be the same data type.
single or double quotes. They can be indexed but strings For example, you can store strings and integers inside
are immutable. Strings are iterables, iterating over each the same list. Lists are mutable; they can be altered after
of the characters. their creation. Since they are ordered, they can be indexed
by position. Note, Python uses zero indexing so the “first”
sentence = ‘The quick brown fox jumped over element is index by 0. Lists are created using square
the lazy dog.’ brackets [].
Common operations with strings and usage: temp_data = [10.5, 12.2, 5, 8.7, 1]
sentence.lower() - returns new string with all Common operations on lists and example usage:
characters in lowercase
sentence.upper() - returns a new string with all temp_data.append(2.5) - adds 2.5 to the end of the
characters in uppercase list
sentence.startswith(‘The’) - returns True or temp_data.sort() - sorts the elements of the list in
False if string starts with ‘The’ ascending order. Use reverse=True
sentence.endswith(‘?’) - returns True or False if to sort by descending order.
string ends with ‘?’ temp_data.remove(12.2) - removes the first
sentence.split() - returns a list resulting from occurrence of 12.2 from the list
splitting the string by a provided separator, defaults temp_data.pop() - remove and returns the last
to splitting by whitespace if no argument is passed. element of the list
sentence.strip() - returns a new string with leading temp_data[0] - access value at position 0
and trailing whitespace removed temp_data[:3] - access the first three values, positions
‘fox’ in sentence - returns True if ‘fox’ is present 0 to 3 (inclusive-exclusive)
in sentence. temp_data[-1] - access the the last element
‘taco ‘ + ‘cat’ - returns a new string from temp_data[1:4:2] - access values from position 1
concatenating the two strings (inclusive) and 4 (exclusive) with a step size of 2
f”My name is {name} and I’m {age} years len(temp_data) - returns the number of values in the
old.” - returns a string with the values of variables list
name and age substituted into {name} and {age}, sum(temp_data) - returns sum of the values of the list
respectively. min(temp_data) - returns the minimum value of the list
sentence.replace(“brown”, “red”) - replace max(temp_data) - returns the maximum value of the
every occurrence of “brown” with “red” list
len(sentence) - returns the number of characters of
the string
–2–
© 2020 Pragmatic Institute, LLC
Tuples Dictionary
Tuples are similar to lists but they are immutable; they Dictionaries store data in key-value pairs. Values can
cannot be modified. As with lists, they can be indexed in a be indexed using the key associated with the value.
similar fashion. Tuples are created by using parentheses There’s no restriction in what can be values but keys
(). are restricted to immutable types. For example, strings,
numerics and tuples can be keys. Dictionaries are
array_shape = (100, 20) created using curly braces {} with the key and value pair
separated by a colon :. Iterating over a dictionary yields
the keys.
customer_data = {
Sets ‘name’: ‘Clarissa’,
‘account_id’: 100045,
Sets are a collection of unique values. They’re a great
‘account_balance’: 4515.76,
data structure to use when wanting to keep track of
‘open_account’: True
only unique values. The members of a set need to be
}
immutable. For example, lists are not allowed but tuples
are. A set can be created by passing an iterable to set or
customer_data[‘name’] - access value associated
directly using curly braces.
with ‘name’
even_numbers = set([x for x in range(100) customer_data[‘telephone’] = None - create new
if x % 2 == 0]) key-value pair ‘telephone’: None
squares = { 1, 1, 2, 4, 2, 9, 16, 25, 36, customer_data[‘telephone’] = ‘555-1234’ -
49, 64, 81, 100} update value of key ‘telephone’
del customer_data[‘telephone’] - delete key-
even_numbers.add(100) - add 100 to the set even_ value for ‘telephone’
numbers ‘age’ in customer_data - returns True/False if key
even_numbers.difference(squares) - returns a ‘age’ is in the dictionary
set that is the difference between even_numbers customer_data.get(‘age’, -1) - returns the value
and squares of key ‘age’ if it exists, returns None otherwise.
even_numbers.union({1, 3, 5, 7, 9}) - return a Optional second argument is returned instead of
set that is the union of the two sets None if key does not exist.
squares.intersection(even_numbers) - return set customer_data.keys() - returns an iterable over all
of common elements keys
1 in even_numbers - returns True or False if 1 is a customer_data.items() - returns an iterable over all
member of the set even_numbers key-value pairs
customer_data.values() - returns an iterable over
all values
* * *
–3–
REFERENCE SHEET
POWERED BY THE SCIENTISTS AT THE DATA INCUBATOR
CREATE TEMP TABLE big AS RIGHT JOIN does the same for the second table.
SELECT * FROM transactions
WHERE items > 100; SELECT customer, items, state
FROM customers RIGHT JOIN transactions
Replace TEMP TABLE with TEMP VIEW to get a live- ON customer_id = id;
updated VIEW.
FULL JOIN includes all unmatched rows.
–2–