Cheat Sheet Data Preprocessing Tasks in Pandas

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

5/10/23, 18:15 about:blank

Data Visualization with Python


Cheat Sheet : Data Preprocessing Tasks in Pandas
Task Syntax Description Example
Read data from a CSV file into a
Load CSV data pd.read_csv('filename.csv') df_can=pd.read_csv('data.csv')
Pandas DataFrame
Handling Missing df.dropna() Drop rows with missing values df_can.dropna()
Values
df.fillna(value)
Fill missing values with a specified df_can.fillna(0)
value
Removing Duplicates df.drop_duplicates() Remove duplicate rows df_can.drop_duplicates()
df.rename(columns={'old_name':
Renaming Columns 'new_name'})
Rename one or more columns df_can.rename(columns={'Age': 'Years'})

Selecting Columns df['column_name'] or Select a single column df_can.Age or df_can['Age]'


df.column_name
df[['col1', 'col2']] Select multiple columns df_can[['Name', 'Age']]
Filtering Rows df[df['column'] > value] Filter rows based on a condition df_can[df_can['Age'] > 30]
Applying Functions to df['column'].apply(function_name)
Apply a function to transform values df_can['Age'].apply(lambda x: x + 1)
Columns in a column
Creating New df['new_column'] = expression
Create a new column with values df_can['Total'] = df_can['Quantity'] *
Columns derived from existing ones df_can['Price']
Grouping and df.groupby('column').agg({'col1': Group rows by a column and apply df_can.groupby('Category').agg({'Total':
Aggregating 'sum', 'col2': 'mean'}) aggregate functions 'mean'})
df.sort_values('column',
Sorting Rows ascending=True/False)
Sort rows based on a column df_can.sort_values('Date', ascending=True)

Displaying First n Show the first n rows of the


df.head(n) df_can.head(3)
Rows DataFrame
Displaying Last n df.tail(n) Show the last n rows of the DataFrame df_can.tail(3)
Rows
Checking for Null df.isnull()
Check for null values in the df_can.isnull()
Values DataFrame
Selecting Rows by df.iloc[index] Select rows based on integer index df_can.iloc[3]
Index
df.iloc[start:end] Select rows in a specified range df_can.iloc[2:5]
Selecting Rows by
df.loc[label] Select rows based on label/index name df_can.loc['Label']
Label
df.loc[start:end]
Select rows in a specified label/index df_can.loc['Age':'Quantity']
range
Generates descriptive statistics for
Summary Statistics df.describe() df_can.describe()
numerical columns

Cheat Sheet : Plot Libraries


Programming Level of Types of Plots
Library Main Purpose Key Features Dashboard Capabilities
Language Customization Possible
Line plots, scatter
Comprehensive plot plots, bar charts,
General-purpose Requires additional
Matplotlib types and variety of Python High histograms, pie charts,
plotting components and customization
customization options box plots, heatmaps,
etc.
Fundamentally used for Line plots, scatter
Easy to plot directly Can be combined with web
data manipulation but plots, bar charts,
Pandas on Panda data Python Medium frameworks for creating
also has plotting histograms, pie charts,
structures dashboards
functionality box plots, etc.
Can be combined with other Heatmaps, violin
Statistical data Stylish, specialized
Seaborn Python Medium libraries to display plots on plots, scatter plots, bar
visualization statistical plot types
dashboards plots, count plots, etc.
Line plots, scatter
Dash framework is dedicated
Interactive data interactive web-based Python, R, plots, bar charts, pie
Plotly High for building interactive
visualization visualizations JavaScript charts, 3D plots,
dashboards
choropleth maps, etc.

about:blank 1/2
5/10/23, 18:15 about:blank
Programming Level of Types of Plots
Library Main Purpose Key Features Dashboard Capabilities
Language Customization Possible
For incorporating maps into Choropleth maps,
Geospatial data Interactive,
Folium Python Medium dashboards, it can be integrated point maps, heatmaps,
visualization customizable maps
with other frameworks/libraries etc.
Can be combined with other Waffle charts, square
PyWaffle Plotting Waffle charts Waffle charts Python Low libraries to display waffle chart pie charts, donut
on dashboards charts, etc.

about:blank 2/2

You might also like