0% found this document useful (0 votes)
10 views7 pages

Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide

This pandas cheat sheet serves as a quick reference for essential commands related to data manipulation and analysis, including importing, cleaning, and exporting data. It covers operations like filtering, sorting, grouping, and calculating statistics, with practical examples using Fortune 500 Companies data. The guide is designed for efficient application of pandas functionalities in data workflows.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Pandas Cheat Sheet Free Resources At: Dataquest - Io/guide

This pandas cheat sheet serves as a quick reference for essential commands related to data manipulation and analysis, including importing, cleaning, and exporting data. It covers operations like filtering, sorting, grouping, and calculating statistics, with practical examples using Fortune 500 Companies data. The guide is designed for efficient application of pandas functionalities in data workflows.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

pandas Cheat Sheet Table of Contents

This cheat sheet offers a handy reference for essential pandas Importing Data Data Cleaning
commands, focused on efficient data manipulation and IMPORT, read_csv, read_table, read_excel, columns, isnull, notnull, dropna,
analysis. Using examples from the Fortune 500 Companies
read_sql, read_json, read_html, fillna, astype, replace, rename,
clipboard, DataFrame set_index, Finding Correlation,
Dataset, it covers key pandas operations such as reading and Converting a Column to Datetime
writing data, selecting and filtering DataFrame values, and Exporting Data
performing common transformations.

to_csv, to_excel, to_sql, to_json, Filter, Sort, & Group By


to_html, to_clipboard columns, sort_values, group by,
pivot_table, apply
You'll find easy-to-follow examples for grouping, sorting, and
Create Test Objects
aggregating data, as well as calculating statistics like mean,
DataFrame, Series, index Join & Combine
correlation, and summary statistics. Whether you're cleaning
append, concat, join
datasets, analyzing trends, or visualizing data, this cheat sheet Working with DataFrames
provides concise instructions to help you navigate pandas’ Dataframe Basics, DataFrame Values, loc, Statistics
powerful functionality.

iloc, Boolean Masks, Boolean Operators, describe, mean, corr, count, max,
Data Exploration, Assigning Values, min, median, std
Boolean Indexing
Designed to be practical and actionable, this guide ensures you
can quickly apply pandas’ versatile data manipulation tools in View & Inspect Data
your workflow. Frequency Table, Histogram, Vertical Bar
Plot, Horizontal Bar Plot, Line Plot,
Scatter Plot, head, tail, shape, info,
describe, value_counts, apply

pandas Cheat Sheet Free resources at: dataquest.io/guide


Importing Data Exporting Data
Syntax for How to use Explained Syntax for How to use Explained
Import the library using its to_csv df.to_csv(filename) Writes to a CSV file
IMPORT import pandas as pd
standard alias

read_csv pd.read_csv(filename) Reads from a CSV file to_excel df.to_excel(filename) Writes to an Excel file

read_table pd.read_table(filename)
Reads from a delimited text
to_sql df.to_sql(table_name, connection_object) Writes to a SQL table
file (like TSV)

to_json df.to_json(filename)
Writes to a file in JSON

read_excel pd.read_excel(filename) Reads from an Excel file format

Reads from a SQL table/ to_html df.to_html(filename) Writes to an HTML table


read_sql pd.read_sql(query, connection_object)
database

Reads from a JSON


to_clipboard df.to_clipboard() Writes to the clipboard
read_json pd.read_json(json_string)
formatted string, URL or file

Parses an html URL, string or


read_html pd.read_html(url) file and extracts tables to a
list of dataframes Create Test Objects
Reads the contents of your
Syntax for How to use Explained
clipboard pd.read_clipboard()
clipboard
DataFrame pd.DataFrame(np.random.rand(20, 5))
5 columns and 20 rows of
random floats
Reads from a dict; keys for

DataFrame pd.DataFrame(dict) Creates a series from an


columns names, values for Series pd.Series(my_list)
data as lists existing list object

index df.index = pd.date_range('1900/1/30',


Adds a date index
periods=df.shape[0])

pandas Cheat Sheet Free resources at: dataquest.io/guide


Working with DataFrames
Syntax for How to use Explained Syntax for How to use Explained

Dataframe

f500 = pd.read_csv('f500.csv', index_col=0)


Read a CSV file into a
il co w
third_ro _first_col = f500.iloc[ , 2 0] ,
Select the third row first
Basics DataFrame column by integer location
y y Return the data type of each Select the second row by
col_t pes = f500.dt pes second_ro w 1
= f500.iloc[ ]
column in a DataFrame g
inte er location

Return the dimensions of a


dims = f500.shape Boolean
Check for null values in the
DataFrame rev_is_null = f500[ "revenue_change"].isnull()
Masks revenue_change column
Selecting
Select the rank column from
f500["rank"] rev_change_null = f500[rev_is_null] Filtering using Boolean array
DataFrame
f500
Values
f500[["country", "rank"]] Select the country and rank
f500[f500[ "previous_rank"].notnull()] Filter rows where
columns from f500
previous_rank is not null

Select the first five rows from


first_five = f500.head(5)
f500
Boolean
filter_big_rev_neg_profit = (
Create a Boolean filter for
o
l c
big_movers = f500.loc[["Aviva", "HP", Use .loc[] to select rows and Operators f500["revenues"] > 100000) &
p
com anies with revenues
columns from f500 by greater than 100,000 and
"JD.com", "BHP Billiton"], ["rank", (f500["profits"] < 0)
label―rows are specified first, profits less than 0
"previous_rank"]]

followed by columns. You can


select individual rows/columns


bottom_companies = f500.loc["National
or multiple by passing a list,
Grid":"AutoNation", ["rank", "sector", and label-based slicing
"country"]]

includes both the start and end


labels.

revenue_giants = f500.loc[["Apple",
"Industrial & Commercial Bank of China",
"China Construction Bank", "Agricultural
Bank of China"], "revenues":"profit_change"]

pandas Cheat Sheet Free resources at: dataquest.io/guide


Working with DataFrames View & Inspect Data
Syntax for How to use Explained Syntax for How to use Explained

Data
revs = f500["revenues"]
Generate summary statistics Frequency
Series.value_counts() Generate a frequency table
Exploration summary_stats = revs.describe() for the revenues column in Table from a Series object
f500
Series.value_counts().sort_index() Generate a sorted frequency
Count the occurrences of each table from a Series object
country_freqs =

f500["country"].value_counts()
country in f500
Histogram Series.plot.hist()
Generate a histogram from a
plt.show() Series object
Assigning
top5_rank_revenue["year_founded"] = 0
Set the year_founded
Values column to 0
Vertical Bar
Series.plot.bar()
Generate a vertical bar plot
f500.loc["Dow Chemical", "ceo"] =
Update the CEO of Dow Plot plt.show() from a Series object
"Jim Fitterling"
Chemical to Jim Fitterling

Horizontal Series.plot.barh() Generate a horizontal bar


Boolean
kr_bool = f500["country"] == "South Korea"
Filter rows for South Korea Bar Plot plt.show() plot from a Series object
Indexing top_5_kr = f500[kr_bool].head() and display the top 5
Line Plot DataFrame.plot.line(x='col_1', y='col_2')
Generate a line plot from a
f500.loc[f500["previous_rank"] == 0,
Replace 0 with NaN in the plt.show()
DataFrame object
"previous_rank"] = np.nan
previous_rank column and
prev_rank_after =
shows the top 5 most
f500["previous_rank"].value_counts(
common values Scatter Plot DataFrame.plot.scatter(x='col_1', y='col_2')
Generate a scatter plot from
dropna=False).head() plt.show() a DataFrame object

pandas Cheat Sheet Free resources at: dataquest.io/guide


View & Inspect Data Data Cleaning
Syntax for How to use Explained Syntax for How to use Explained

head df.head(n) First n rows of the DataFrame columns df.columns = ['a', 'b', 'c'] Renames columns

tail df.tail(n) Last n rows of the DataFrame isnull pd.isnull()


Checks for null Values,
Returns Boolean Array
shape df.shape() Number of rows and columns notnull pd.notnull() Opposite of pd.isnull()

info df.info()
Index, Datatype and Memory Drops all rows that contain
dropna df.dropna()
information null values
Summary statistics for df.dropna(axis=1) Drops all columns that

describe df.describe()
numerical columns contain null values

Views unique values and Drops all rows have have less
value_counts df.dropna(axis=1, thresh=n)
s.value_counts(dropna=False)
counts than n non-null values

Unique values and counts for fillna df.fillna(x)


Replaces all null values with
apply df.apply(pd.Series.value_counts) x
all columns
s.fillna(s.mean())
Replaces all null values with

the mean (mean can be


replaced with almost

any function from the


statistics section)

astype s.astype(float)
Converts the datatype of the

Series to float

replace s.replace(1, 'one')


Replaces all values equal to

1 with one

pandas Cheat Sheet Free resources at: dataquest.io/guide


Data Cleaning Filter, Sort, & Group By
Syntax for How to use Explained Syntax for How to use Explained
Replaces all 1 with 'one' and Rows where the col column

replace s.replace([1, 3], ['one','three']) columns df[df[col] > 0.5]


3 with 'three' is greater than 0.5

df[(df[col] > 0.5) & (df[col] < 0.7)]


Rows where 0.7 > col >
rename df.rename(columns=lambda x: x + 1) Mass renaming of columns
0.5

df.rename(columns={'old_name': 'new_name'}) Selective renaming of Sorts values by col1 in

sort_values df.sort_values(col1)
columns ascending order

df.rename(index=lambda x: x + 1) Mass renaming of index Sorts values by col2 in


df.sort_values(col2, ascending=False)
descending order

Selectively sets the index Sorts values by col1 in


set_index df.set_index('column_one') df.sort_values([col1, col2],

ascending order then col2


ascending=[True, False])
in descending order
Finding
Calculate Pearson's r
f500['revenues'].corr(f500[profits]) correlation between
Correlation Returns a groupby object for
revenues and profits groupby df.groupby(col)
values from one column
Calculate the Pearson's r Returns a groupby object
f500.corr() df.groupby([col1, col2]) values from multiple
correlation matrix between all
columns of f500 columns

Calculate the correlation Returns the mean of the


df.groupby(col1)[col2].mean()
f500.corr()[['revenues',

matrix for f500 and select the values in col2, grouped by


'profits',
correlations for the revenues, the values in col1 (mean
'assets']] profits, and assets can be replaced with almost
columns any function from the
statistics section)
Converting a
f500['founding_date'] =
Convert the founding_date
Column to
column in f500 to datetime pivot_table df.pivot_table(index=col1,
Creates a pivot table that
f500.to_datetime(f500['founding_date']) groups by col1 and
Datetime format
values=[col2, col3],

calculates the mean of col2


aggfunc=mean)
and col3

pandas Cheat Sheet Free resources at: dataquest.io/guide


Filter, Sort, & Group By Statistics
Syntax for How to use Explained Syntax for How to use Explained

groupby df.groupby(col1).agg(np.mean)
Finds the average across all describe df.describe()
Summary statistics for
columns for every unique
numerical columns
col 1 group
mean df.mean()
Returns the mean of all
Applies a function across
columns
apply df.apply(np.mean)
each column
corr df.corr()
Returns the correlation
df.apply(np.max, axis=1)
Applies a function across between columns in a
each row DataFrame

count df.count()
Returns the number of non-
null values in each
Join & Combine DataFrame column

ma x df.max()
Returns the highest value in
Syntax for How to use Explained each column

append
Adds the rows in df1 to the
Returns the lowest value in
df1.append(df2) min df.min()
end of df2 (number of columns each column
should be identical)
median df.median()
Returns the median of each
concat
Adds the columns in df1 to column
pd.concat([df1, df2], axis=1)
the end of df2 (number of
rows should be identical) std df.std()
Returns the standard
deviation of each column
join df1.join(df2, on=col1, how='inner')
SQL-style joins the columns
in df1 with the columns

on df2 where the rows for


col have identical values. how
can be one of 'left',
'right', 'outer', 'inner'

pandas Cheat Sheet Free resources at: dataquest.io/guide

You might also like