100% found this document useful (1 vote)
194 views1 page

Python Cheat Sheet Code Academy

This document provides a cheat sheet for the Pandas library in Python. It summarizes key functions for importing and exporting data, selecting and filtering data, cleaning and transforming data, joining/combining data, and descriptive statistics. Some important functions covered include reading/writing CSV/Excel files, selecting columns/rows, dropping null values, grouping/pivoting data, concatenating DataFrames, and calculating means, medians, and standard deviations. The cheat sheet is intended to be a handy reference for common Pandas tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
194 views1 page

Python Cheat Sheet Code Academy

This document provides a cheat sheet for the Pandas library in Python. It summarizes key functions for importing and exporting data, selecting and filtering data, cleaning and transforming data, joining/combining data, and descriptive statistics. Some important functions covered include reading/writing CSV/Excel files, selecting columns/rows, dropping null values, grouping/pivoting data, concatenating DataFrames, and calculating means, medians, and standard deviations. The cheat sheet is intended to be a handy reference for common Pandas tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

LEARN DATA SCIENCE ONLINE

Start Learning For Free - [Link]

Data Science Cheat Sheet


Pandas

KEY IMPORTS
We’ll use shorthand in this cheat sheet Import these to start
df - A pandas DataFrame object import pandas as pd
s - A pandas Series object import numpy as np

I M P O RT I N G DATA SELECTION col1 in ascending order then col2 in descending


pd.read_csv(filename) - From a CSV file df[col] - Returns column with label col as Series order
pd.read_table(filename) - From a delimited text df[[col1, col2]] - Returns Columns as a new [Link](col) - Returns a groupby object for
file (like TSV) DataFrame values from one column
pd.read_excel(filename) - From an Excel file [Link][0] - Selection by position [Link]([col1,col2]) - Returns a groupby
pd.read_sql(query, connection_object) - [Link][0] - Selection by index object values from multiple columns
Reads from a SQL table/database [Link][0,:] - First row [Link](col1)[col2].mean() - Returns the
pd.read_json(json_string) - Reads from a JSON [Link][0,0] - First element of first column mean of the values in col2, grouped by the
formatted string, URL or file. values in col1 (mean can be replaced with
pd.read_html(url) - Parses an html URL, string or DATA C L E A N I N G almost any function from the statistics section)
file and extracts tables to a list of dataframes [Link] = ['a','b','c'] - Renames columns df.pivot_table(index=col1,values=
pd.read_clipboard() - Takes the contents of your [Link]() - Checks for null Values, Returns [col2,col3],aggfunc=mean) - Creates a pivot
clipboard and passes it to read_table() Boolean Array table that groups by col1 and calculates the
[Link](dict) - From a dict, keys for [Link]() - Opposite of [Link]() mean of col2 and col3
columns names, values for data as lists [Link]() - Drops all rows that contain null [Link](col1).agg([Link]) - Finds the
values average across all columns for every unique
E X P O RT I N G DATA [Link](axis=1) - Drops all columns that column 1 group
df.to_csv(filename) - Writes to a CSV file contain null values [Link]([Link]) - Applies a function across
df.to_excel(filename) - Writes to an Excel file [Link](axis=1,thresh=n) - Drops all rows each column
df.to_sql(table_name, connection_object) - have have less than n non null values [Link]([Link], axis=1) - Applies a function
Writes to a SQL table [Link](x) - Replaces all null values with x across each row
df.to_json(filename) - Writes to a file in JSON [Link]([Link]()) - Replaces all null values with
format the mean (mean can be replaced with almost J O I N /C O M B I N E
df.to_html(filename) - Saves as an HTML table any function from the statistics section) [Link](df2) - Adds the rows in df1 to the
df.to_clipboard() - Writes to the clipboard [Link](float) - Converts the datatype of the end of df2 (columns should be identical)
series to float [Link]([df1, df2],axis=1) - Adds the
C R E AT E T E ST O B J E C TS [Link](1,'one') - Replaces all values equal to columns in df1 to the end of df2 (rows should be
Useful for testing 1 with 'one' identical)
[Link]([Link](20,5)) - 5 [Link]([1,3],['one','three']) - Replaces [Link](df2,on=col1,how='inner') - SQL-style
columns and 20 rows of random floats all 1 with 'one' and 3 with 'three' joins the columns in df1 with the columns
[Link](my_list) - Creates a series from an [Link](columns=lambda x: x + 1) - Mass on df2 where the rows for col have identical
iterable my_list renaming of columns values. how can be one of 'left', 'right',
[Link] = pd.date_range('1900/1/30', [Link](columns={'old_name': 'new_ 'outer', 'inner'
periods=[Link][0]) - Adds a date index name'}) - Selective renaming
df.set_index('column_one') - Changes the index STAT I ST I C S
V I E W I N G/ I N S P E C T I N G DATA [Link](index=lambda x: x + 1) - Mass These can all be applied to a series as well.
[Link](n) - First n rows of the DataFrame renaming of index [Link]() - Summary statistics for numerical
[Link](n) - Last n rows of the DataFrame columns
[Link]() - Number of rows and columns F I LT E R, S O RT, & G R O U P BY [Link]() - Returns the mean of all columns
[Link]() - Index, Datatype and Memory df[df[col] > 0.5] - Rows where the col column [Link]() - Returns the correlation between
information is greater than 0.5 columns in a DataFrame
[Link]() - Summary statistics for numerical df[(df[col] > 0.5) & (df[col] < 0.7)] - [Link]() - Returns the number of non-null
columns Rows where 0.7 > col > 0.5 values in each DataFrame column
s.value_counts(dropna=False) - Views unique df.sort_values(col1) - Sorts values by col1 in [Link]() - Returns the highest value in each
values and counts ascending order column
[Link]([Link].value_counts) - Unique df.sort_values(col2,ascending=False) - Sorts [Link]() - Returns the lowest value in each column
values and counts for all columns values by col2 in descending order [Link]() - Returns the median of each column
df.sort_values([col1,col2], [Link]() - Returns the standard deviation of each
ascending=[True,False]) - Sorts values by column

LEARN DATA SCIENCE ONLINE


Start Learning For Free - [Link]

You might also like