Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide
Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide
This cheat sheet offers a handy reference for essential pandas Importing Data Data Cleaning
commands, focused on efficient data manipulation and IMPORT, read_csv, read_table, read_excel, columns, isnull, notnull, dropna,
analysis. Using examples from the Fortune 500 Companies
read_sql, read_json, read_html, fillna, astype, replace, rename,
clipboard, DataFrame set_index, Finding Correlation,
Dataset, it covers key pandas operations such as reading and Converting a Column to Datetime
writing data, selecting and filtering DataFrame values, and Exporting Data
performing common transformations.
iloc, Boolean Masks, Boolean Operators, describe, mean, corr, count, max,
Data Exploration, Assigning Values, min, median, std
Boolean Indexing
Designed to be practical and actionable, this guide ensures you
can quickly apply pandas’ versatile data manipulation tools in View & Inspect Data
your workflow. Frequency Table, Histogram, Vertical Bar
Plot, Horizontal Bar Plot, Line Plot,
Scatter Plot, head, tail, shape, info,
describe, value_counts, apply
read_csv pd.read_csv(filename) Reads from a CSV file to_excel df.to_excel(filename) Writes to an Excel file
read_table pd.read_table(filename)
Reads from a delimited text
to_sql df.to_sql(table_name, connection_object) Writes to a SQL table
file (like TSV)
to_json df.to_json(filename)
Writes to a file in JSON
Dataframe
revenue_giants = f500.loc[["Apple",
"Industrial & Commercial Bank of China",
"China Construction Bank", "Agricultural
Bank of China"], "revenues":"profit_change"]
Data
revs = f500["revenues"]
Generate summary statistics Frequency
Series.value_counts() Generate a frequency table
Exploration summary_stats = revs.describe() for the revenues column in Table from a Series object
f500
Series.value_counts().sort_index() Generate a sorted frequency
Count the occurrences of each table from a Series object
country_freqs =
f500["country"].value_counts()
country in f500
Histogram Series.plot.hist()
Generate a histogram from a
plt.show() Series object
Assigning
top5_rank_revenue["year_founded"] = 0
Set the year_founded
Values column to 0
Vertical Bar
Series.plot.bar()
Generate a vertical bar plot
f500.loc["Dow Chemical", "ceo"] =
Update the CEO of Dow Plot plt.show() from a Series object
"Jim Fitterling"
Chemical to Jim Fitterling
head df.head(n) First n rows of the DataFrame columns df.columns = ['a', 'b', 'c'] Renames columns
info df.info()
Index, Datatype and Memory Drops all rows that contain
dropna df.dropna()
information null values
Summary statistics for df.dropna(axis=1) Drops all columns that
describe df.describe()
numerical columns contain null values
Views unique values and Drops all rows have have less
value_counts df.dropna(axis=1, thresh=n)
s.value_counts(dropna=False)
counts than n non-null values
astype s.astype(float)
Converts the datatype of the
Series to float
1 with one
sort_values df.sort_values(col1)
columns ascending order
groupby df.groupby(col1).agg(np.mean)
Finds the average across all describe df.describe()
Summary statistics for
columns for every unique
numerical columns
col 1 group
mean df.mean()
Returns the mean of all
Applies a function across
columns
apply df.apply(np.mean)
each column
corr df.corr()
Returns the correlation
df.apply(np.max, axis=1)
Applies a function across between columns in a
each row DataFrame
count df.count()
Returns the number of non-
null values in each
Join & Combine DataFrame column
ma x df.max()
Returns the highest value in
Syntax for How to use Explained each column
append
Adds the rows in df1 to the
Returns the lowest value in
df1.append(df2) min df.min()
end of df2 (number of columns each column
should be identical)
median df.median()
Returns the median of each
concat
Adds the columns in df1 to column
pd.concat([df1, df2], axis=1)
the end of df2 (number of
rows should be identical) std df.std()
Returns the standard
deviation of each column
join df1.join(df2, on=col1, how='inner')
SQL-style joins the columns
in df1 with the columns