Python
Cheat Sheet
Python | Pandas
Data Analysis
Data Visualization
by Frank Andrade
Python Basics Variables
Variable assignment:
Creating a new list:
numbers = [4, 3, 10, 7, 1, 2]
Cheat Sheet
message_1 = "I'm learning Python" Sorting a list:
message_2 = "and it's fun!" >>> numbers.sort()
[1, 2, 3, 4, 7, 10]
Here you will find all the Python core concepts you need to String concatenation (+ operator):
message_1 + ' ' + message_2 >>> numbers.sort(reverse=True)
know before learning any third-party library.
[10, 7, 4, 3, 2, 1]
String concatenation (f-string):
f'{message_1} {message_2}'
Data Types
Update value on a list:
>>> numbers[0] = 1000
Integers (int): 1 >>> numbers
Float (float): 1.2
List [1000, 7, 4, 3, 2, 1]
String (str): "Hello World" Creating a list:
Copying a list:
Boolean: True/False countries = ['United States', 'India', new_list = countries[:]
'China', 'Brazil'] new_list_2 = countries.copy()
List: [value1, value2]
Dictionary: {key1:value1, key2:value2, ...}
Create an empty list:
my_list = [] Built-in Functions
Numeric Operators Comparison Operators Indexing: Print an object:
>>> countries[0] print("Hello World")
+ Addition
== United States
Equal to
Return the length of x:
- Subtraction >>> countries[3] len(x)
!= Different Brazil
Multiplication
Return the minimum value:
*
> Greater than >>> countries[-1] min(x)
Division
Brazil
/ < Less than
Return the maximum value:
Slicing:
Exponent
max(x)
** >= Greater than or equal to >>>countries[0:3]
['United States', 'India', 'China']
Returns a sequence of numbers:
% Modulus range(x1,x2,n) # from x1 to x2
<= Less than or equal to
>>>countries[1:] (increments by n)
// Floor division ['India', 'China', 'Brazil']
Convert x to a string:
>>>countries[:2] str(x)
['United States', 'India']
String methods
Convert x to an integer/float:
Adding elements to a list: int(x)
string.upper(): converts to uppercase countries.append('Canada') float(x)
string.lower(): converts to lowercase countries.insert(0,'Canada')
string.title(): converts to title case Convert x to a list:
Nested list: list(x)
string.count('l'): counts how many times "l" nested_list = [countries, countries_2]
appears
string.find('h'): position of the "h" first Remove element:
countries.remove('United States')
ocurrance countries.pop(0)#removes and returns value
string.replace('o', 'u'): replaces "o" with "u" del countries[0]
Dictionary If Statement Functions
Creating a dictionary: Create a function:
Conditional test:
my_data = {'name':'Frank', 'age':26} def function(<params>):
if <condition>:
<code> <code>
Create an empty dictionary: elif <condition>: return <data>
my_dict = {} <code>
...
Get value of key "name": else:
Modules
>>> my_data["name"] <code> Import module:
'Frank'
import module
Example: module.method()
Get the keys: if age>=18:
>>> my_data.keys() print("You're an adult!") OS module:
dict_keys(['name', 'age'])
import os
Conditional test with list: os.getcwd()
Get the values: if <value> in <list>: os.listdir()
>>> my_data.values() <code> os.makedirs(<path>)
dict_values(['Frank', 26])
Get the pair key-value:
>>> my_data.items()
Loops Special Characters
dict_items([('name', 'Frank'), ('age', 26)]) For loop: # Comment
for <variable> in <list>:
Adding/updating items in a dictionary: <code> \n New Line
my_data['height']=1.7
my_data.update({'height':1.8, For loop and enumerate list elements:
'languages':['English', 'Spanish']}) for i, element in enumerate(<list>): Boolean Operators Boolean Operators
>>> my_data <code> (Pandas)
{'name': 'Frank',
'age': 26, For loop and obtain dictionary elements: and logical AND & logical AND
'height': 1.8, for key, value in my_dict.items():
'languages': ['English', 'Spanish']} <code> or logical OR | logical OR
Remove an item: While loop: not logical NOT ~ logical NOT
my_data.pop('height') while <condition>:
del my_data['languages'] <code>
my_data.clear()
Copying a dictionary: Data Validation
new_dict = my_data.copy()
Try-except:
try:
<code> Below there are my guides, tutorials
except <error>:
<code> and complete Python courses:
- Medium Guides
Loop control statement: - YouTube Tutorials
break: stops loop execution
continue: jumps to next iteration - Udemy Courses
pass: does nothing
Made by Frank Andrade frank-andrade.medium.com
Pandas Selecting rows and columns Merge multiple data frames horizontally:
df3 = pd.DataFrame([[1, 7],[8,9]],
Cheat Sheet
Select single column: index=['B', 'D'],
df['col1'] columns=['col1', 'col3'])
#df3: new dataframe
Select multiple columns: Only merge complete rows (INNER JOIN):
Pandas provides data analysis tools for Python. All of the df[['col1', 'col2']] df.merge(df3)
following code examples refer to the dataframe below.
Show first n rows: Left column stays complete (LEFT OUTER JOIN):
df.head(2) df.merge(df3, how='left')
axis 1
col1 col2 Show last n rows: Right column stays complete (RIGHT OUTER JOIN):
df.tail(2) df.merge(df3, how='right')
A 1 4
Select rows by index values: Preserve all values (OUTER JOIN):
axis 0
df = B 2 5
df.loc['A'] df.loc[['A', 'B']]
df.merge(df3, how='outer')
C 3 6 Select rows by position: Merge rows by index:
df.loc[1] df.loc[1:] df.merge(df3,left_index=True,
right_index=True)
Getting Started Data wrangling Fill NaN values:
df.fillna(0)
Import pandas: Filter by value:
import pandas as pd df[df['col1'] > 1] Apply your own function:
def func(x):
Sort by one column: return 2**x
Create a series: df.sort_values('col1') df.apply(func)
s = pd.Series([1, 2, 3],
Sort by columns:
index=['A', 'B', 'C'], df.sort_values(['col1', 'col2'], Arithmetics and statistics
name='col1') ascending=[False, True])
Add to all values:
Create a dataframe:
Identify duplicate rows: df + 10
data = [[1, 4], [2, 5], [3, 6]] df.duplicated()
index = ['A', 'B', 'C']
Sum over columns:
df = pd.DataFrame(data, index=index, Identify unique rows: df.sum()
df['col1'].unique()
columns=['col1', 'col2'])
Cumulative sum over columns:
Read a csv file with pandas: Swap rows and columns: df.cumsum()
df = pd.read_csv('filename.csv') df = df.transpose()
df = df.T Mean over columns:
df.mean()
Advanced parameters: Drop a column:
df = pd.read_csv('filename.csv', sep=',', df = df.drop('col1', axis=1) Standard deviation over columns:
df.std()
names=['col1', 'col2'], Clone a data frame:
index_col=0, clone = df.copy() Count unique values:
encoding='utf-8',
df['col1'].value_counts()
Connect multiple data frames vertically:
nrows=3) df2 = df + 5 #new dataframe Summarize descriptive statistics:
pd.concat([df,df2]) df.describe()
Hierarchical indexing Data export Visualization
Create hierarchical index: Data as NumPy array: The plots below are made with a dataframe
df.stack() df.values with the shape of df_gdp (pivot() method)
Dissolve hierarchical index: Save data as CSV file:
df.unstack() df.to_csv('output.csv', sep=",") Import matplotlib:
import matplotlib.pyplot as plt
Format a dataframe as tabular string:
Aggregation
df.to_string() Start a new diagram:
plt.figure()
Create group object: Convert a dataframe to a dictionary:
g = df.groupby('col1') df.to_dict() Scatter plot:
df.plot(kind='scatter')
Iterate over groups: Save a dataframe as an Excel table:
for i, group in g: df.to_excel('output.xlsx') Bar plot:
print(i, group)
df.plot(kind='bar',
xlabel='data1',
Aggregate groups: ylabel='data2')
g.sum()
g.prod()
Pivot and Pivot Table
Lineplot:
g.mean() Read csv file 1: df.plot(kind='line',
g.std() df_gdp = pd.read_csv('gdp.csv') figsize=(8,4))
g.describe()
The pivot() method: Boxplot:
Select columns from groups: df_gdp.pivot(index="year", df['col1'].plot(kind='box')
g['col2'].sum() columns="country",
g[['col2', 'col3']].sum() values="gdppc") Histogram over one column:
df['col1'].plot(kind='hist',
Transform values: Read csv file 2: bins=3)
import math df_sales=pd.read_excel(
g.transform(math.log) 'supermarket_sales.xlsx') Piechart:
df.plot(kind='pie',
Apply a list function on each group: Make pivot table: y='col1',
def strsum(group): df_sales.pivot_table(index='Gender', title='Population')
return ''.join([str(x) for x in group.value]) aggfunc='sum')
Set tick marks:
g['col2'].apply(strsum) Make a pivot tables that says how much male and labels = ['A', 'B', 'C', 'D']
female spend in each category: positions = [1, 2, 3, 4]
plt.xticks(positions, labels)
df_sales.pivot_table(index='Gender', plt.yticks(positions, labels)
columns='Product line',
values='Total', Label diagram and axes:
Below there are my guides, tutorials plt.title('Correlation')
aggfunc='sum')
and complete Python courses:
plt.xlabel('Nunstück')
- Medium Guides plt.ylabel('Slotermeyer')
- YouTube Tutorials Save most recent diagram:
- Udemy Courses plt.savefig('plot.png')
plt.savefig('plot.png',dpi=300)
Made by Frank Andrade frank-andrade.medium.com plt.savefig('plot.svg')