0% found this document useful (0 votes)
33 views2 pages

Datavischeatsheet

This document provides a cheat sheet on common data visualization and data preprocessing tasks that can be performed in Pandas. It lists functions and their usage for tasks like loading CSV data, handling missing values, removing duplicates, selecting and filtering data, grouping and aggregating data, and generating summary statistics. It also summarizes several popular Python libraries for data visualization, their main purposes, key features, levels of customization, and types of plots they can produce. These libraries include Matplotlib, Pandas, Seaborn, Plotly, Folium, and PyWaffle.

Uploaded by

rcg97.hd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views2 pages

Datavischeatsheet

This document provides a cheat sheet on common data visualization and data preprocessing tasks that can be performed in Pandas. It lists functions and their usage for tasks like loading CSV data, handling missing values, removing duplicates, selecting and filtering data, grouping and aggregating data, and generating summary statistics. It also summarizes several popular Python libraries for data visualization, their main purposes, key features, levels of customization, and types of plots they can produce. These libraries include Matplotlib, Pandas, Seaborn, Plotly, Folium, and PyWaffle.

Uploaded by

rcg97.hd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

28/07/23, 10:11 PM

Skills
Network

Data Visualization with Python


Cheat Sheet : Data Preprocessing Tasks in Pandas
Task Syntax Description Example
Read data from a CSV file into a
Load CSV data pd.read_csv('filename.csv') df_can=pd.read_csv('data.csv')
Pandas DataFrame
Handling Missing
df.dropna() Drop rows with missing values df_can.dropna()
Values
Fill missing values with a specified
df.fillna(value) df_can.fillna(0)
value
Removing Duplicates df.drop_duplicates() Remove duplicate rows df_can.drop_duplicates()
df.rename(columns={'old_name':
Renaming Columns 'new_name'})
Rename one or more columns df_can.rename(columns={'Age': 'Years'})

Selecting Columns df['column_name'] or Select a single column df_can.Age or df_can['Age]'


df.column_name
df[['col1', 'col2']] Select multiple columns df_can[['Name', 'Age']]
Filtering Rows df[df['column'] > value] Filter rows based on a condition df_can[df_can['Age'] > 30]
Applying Functions to Apply a function to transform values
df['column'].apply(function_name) df_can['Age'].apply(lambda x: x + 1)
Columns in a column
Creating New Create a new column with values df_can['Total'] = df_can['Quantity'] *
df['new_column'] = expression
Columns derived from existing ones df_can['Price']
Grouping and df.groupby('column').agg({'col1': Group rows by a column and apply df_can.groupby('Category').agg({'Total':
Aggregating 'sum', 'col2': 'mean'}) aggregate functions 'mean'})
df.sort_values('column',
Sorting Rows ascending=True/False)
Sort rows based on a column df_can.sort_values('Date', ascending=True)

Displaying First n Show the first n rows of the


df.head(n) df_can.head(3)
Rows DataFrame
Displaying Last n
df.tail(n) Show the last n rows of the DataFrame df_can.tail(3)
Rows
Checking for Null Check for null values in the
df.isnull() df_can.isnull()
Values DataFrame
Selecting Rows by
df.iloc[index] Select rows based on integer index df_can.iloc[3]
Index
df.iloc[start:end] Select rows in a specified range df_can.iloc[2:5]
Selecting Rows by
df.loc[label] Select rows based on label/index name df_can.loc['Label']
Label
Select rows in a specified label/index
df.loc[start:end] df_can.loc['Age':'Quantity']
range
Generates descriptive statistics for
Summary Statistics df.describe() df_can.describe()
numerical columns

Cheat Sheet : Plot Libraries


Programming Level of
Library Main Purpose Key Features Dashboard Capabilities Types of Plots Possible
Language Customization
Line plots, scatter plots,
Comprehensive plot Requires additional
bar charts, histograms, pie
Matplotlib General-purpose plotting types and variety of Python High components and
charts, box plots,
customization options customization
heatmaps, etc.
Fundamentally used for data Can be combined with web Line plots, scatter plots,
Easy to plot directly on
Pandas manipulation but also has Python Medium frameworks for creating bar charts, histograms, pie
Panda data structures
plotting functionality dashboards charts, box plots, etc.
Can be combined with other Heatmaps, violin plots,
Stylish, specialized
Seaborn Statistical data visualization Python Medium libraries to display plots on scatter plots, bar plots,
statistical plot types
dashboards count plots, etc.
Line plots, scatter plots,
Dash framework is dedicated

https://author-ide.skills.network/render?token=eyJhbGciOiJIUzI…QiOjE2ODc0NTYyMDh9.FIodb3zHiLRl0-XWe5D_eOF4up7Iq2RkldXWE29rMk4 Page 1 of 2
28/07/23, 10:11 PM

Plotly Interactive data visualization interactive web-based Python, R, High for building interactive bar charts, pie charts, 3D
visualizations JavaScript dashboards plots, choropleth maps,
etc.
For incorporating maps into
Interactive, dashboards, it can be Choropleth maps, point
Folium Geospatial data visualization Python Medium
customizable maps integrated with other maps, heatmaps, etc.
frameworks/libraries
Can be combined with other
Waffle charts, square pie
PyWaffle Plotting Waffle charts Waffle charts Python Low libraries to display waffle
charts, donut charts, etc.
chart on dashboards

https://author-ide.skills.network/render?token=eyJhbGciOiJIUzI…iOjE2ODc0NTYyMDh9.FIodb3zHiLRl0-XWe5D_eOF4up7Iq2RkldXWE29rMk4 Page 2 of 2

You might also like