0% found this document useful (0 votes)

9 views

Data Cheat Sheet

Uploaded by

nikhildixit31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Data Cheat Sheet

Uploaded by

nikhildixit31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Pandas Reference Sheet

POWERED BY THE SCIENTISTS AT THE DATA INCUBATOR

Loading/exporting a data set Examining the data

path_to_file: string indicating the path to the file, df.head(n)—returns first n rows
e.g., ‘data/results.csv’
df.tail(n)—returns last n rows
df = pd.read_csv(path_to_file)—read a CSV file df.describe()—returns summary statistics for each
df = pd.read_excel(path_to_file)—read an Excel file numerical column

df = pd.read_html(path_to_file)—parses HTML to find df[‘State’].unique()—returns unique values for the

all tables column

df.to_csv(path_to_file)—creates CSV of the data frame df.columns—returns column names

df.shape—returns the number of rows and columns

Selecting and filtering Statistical operations

can be applied to both data frames and series/column
SELECTING COLUMNS
df[‘State’]—selects ‘State’ column df[‘Population’].sum()—sum of all values of a column
df[[‘State’, ‘Population’]]—selects ‘State’ and df.sum()—sum for all numerical columns
‘Population’ column df.mean()—mean
df.std()—standard deviation
SELECTING BY LABEL
df.min()— minimum value
df.loc[‘a’]—selects row by index label
df.count()—count of values, excludes missing values
df.loc[‘a’, ‘State’]—selects single value of row ‘a’ and
column ‘State’ df.max()—maximum value
df[‘Population’].apply(func)—apply func to each
SELECTING BY POSITION value of column
df.iloc[0]—selects rows in position 0
df.iloc[0, 0]—selects single value by position at row 0 and
column 0
Data cleaning and modifications
FILTERING
df[‘State’].isnull()—returns True/False for rows with
df[df[‘Population’] > 20000000]]—filter out rows not missing values
meeting the condition
df.dropna(axis=0)—drop rows containing missing values
df.query(“Population > 20000000”)—filter out rows
df.dropna(axis=1)—drop columns containing missing
not meeting the condition
values
df.fillna(0)—fill in missing values, here filled with 0
State Capital Population df.sort_values(‘Population’, ascending=True)
—sort rows by a column’s values
a Texas Austin 28700000
df.set_index(‘State’)—changes index to a specified
column
b New York Albany 19540000
df.reset_index()—makes the current index a column

c Washington Olympia 7536000 df.rename(columns={‘Population’=’Pop.’})

—renames columns
Example data frame
© 2019 Pragmatic Institute, LLC
Grouping and aggregation
grouped = df.groupby(by=’col1’)—create grouped by object
grouped[‘col2’].mean()—mean value of ‘col2’ for each group
grouped.agg({‘col2’: np.mean, ‘col3’: [np.mean, np.std]})—apply different functions to different columns
grouped.apply(func)—apply func to each group

col1 col2 col3

Merging data frames

There are several ways to merge two data frames, depending on the value of method. The resulting indices are integers starting with zero.

df1.merge(df2, how=method, on=’State’)

State Capital Population State Highest Point

+
a Texas Austin 28700000 x Washington Mount Rainier

b New York Albany 19540000 y New York Mount Marcy

c Washington Olympia 7536000 z Nebraska Panorama Point

Data frame df1 Data frame df2

State Capital Population Highest Point State Capital Population Highest Point

0 Texas Austin 28700000 NaN 0 New York Albany 19540000 Mount Marcy

1 New York Albany 19540000 Mount Marcy 1 Washington Olympia 7536000 Mount Rainier

2 Washington Olympia 7536000 Mount Rainier how=‘inner’

how=‘left’ State Capital Population Highest Point

State Capital Population Highest Point 0 Texas Austin 28700000 NaN

0 New York Albany 19540000 Mount Marcy 1 New York Albany 19540000 Mount Marcy

1 Washington Olympia 7536000 Mount Rainier 2 Washington Olympia 7536000 Mount Rainier

2 Nebraska NaN NaN Panorama Point 3 Nebraska NaN NaN Panorama Point

how=‘right’ how=‘outer’

Register or learn more about other courses in our data curriculum by visiting pragmaticinstitute.com/data-science or calling 480.515.1411.

Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Speedemy MySQL Configuration Tuning Handbook
No ratings yet
Speedemy MySQL Configuration Tuning Handbook
42 pages
CDM Sample Resume 1
No ratings yet
CDM Sample Resume 1
4 pages
Pandas: Reference Sheet
No ratings yet
Pandas: Reference Sheet
9 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas in Python
No ratings yet
Pandas in Python
59 pages
Mastering Data Analyst Interview Scenarios
No ratings yet
Mastering Data Analyst Interview Scenarios
20 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
justenoughpython_pandas_220915_175329
No ratings yet
justenoughpython_pandas_220915_175329
64 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Loc Iloc at Dataframe
No ratings yet
Loc Iloc at Dataframe
9 pages
Introduction To Pandas - Ipynb - Colaboratory
No ratings yet
Introduction To Pandas - Ipynb - Colaboratory
7 pages
CO3_1_Pandas Series and Data Frame
No ratings yet
CO3_1_Pandas Series and Data Frame
37 pages
Pandas Python For Data Science
No ratings yet
Pandas Python For Data Science
1 page
Pandas Python For Data Science
100% (1)
Pandas Python For Data Science
1 page
Numpy
No ratings yet
Numpy
9 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Python Cheatsy
No ratings yet
Python Cheatsy
1 page
Advanced Python Programming Data Science: The University of Sheffield
No ratings yet
Advanced Python Programming Data Science: The University of Sheffield
55 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Cheat Python
No ratings yet
Cheat Python
8 pages
Pandas Presentation Ip
No ratings yet
Pandas Presentation Ip
28 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas PDF(2)
No ratings yet
Pandas PDF(2)
25 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Pandaspythonfordatascience
No ratings yet
Pandaspythonfordatascience
1 page
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Data Analysis Cheat Sheet
No ratings yet
Data Analysis Cheat Sheet
1 page
Pandas
No ratings yet
Pandas
13 pages
Pandas
No ratings yet
Pandas
13 pages
Python for ML
No ratings yet
Python for ML
41 pages
Advance Operations On Dataframes
No ratings yet
Advance Operations On Dataframes
3 pages
Pandas_Filtering
No ratings yet
Pandas_Filtering
19 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
hduud
No ratings yet
hduud
55 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
No ratings yet
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
1 page
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Pandas Commands
No ratings yet
Pandas Commands
3 pages
Intro Pandas
No ratings yet
Intro Pandas
18 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas PDF
No ratings yet
Pandas PDF
6 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Tutorial 4
No ratings yet
Tutorial 4
8 pages
a5
No ratings yet
a5
28 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
DNS Troubleshooting
No ratings yet
DNS Troubleshooting
4 pages
Web Technology Assignment
No ratings yet
Web Technology Assignment
2 pages
SQL
No ratings yet
SQL
9 pages
Writing The USRP File System Disk Image To A SD Card
No ratings yet
Writing The USRP File System Disk Image To A SD Card
2 pages
Online Hotel Management System: Bachelor of Computer Applications
No ratings yet
Online Hotel Management System: Bachelor of Computer Applications
66 pages
Content - Based Recommendation System
No ratings yet
Content - Based Recommendation System
3 pages
Designing Data-Intensive Applications Cheat Sheet
No ratings yet
Designing Data-Intensive Applications Cheat Sheet
10 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
54 pages
Name: Dipak Munde Roll No: TE21261 A4: Write A Program To Insert and Retrieve The Data From The Database Using JDBC. Program: // Program To Insert Data Into Database
No ratings yet
Name: Dipak Munde Roll No: TE21261 A4: Write A Program To Insert and Retrieve The Data From The Database Using JDBC. Program: // Program To Insert Data Into Database
4 pages
U01628635 Richa Kharbanda Assignment 3
No ratings yet
U01628635 Richa Kharbanda Assignment 3
8 pages
IMS DLI Essentials 1.0
No ratings yet
IMS DLI Essentials 1.0
147 pages
Personalizing Access
No ratings yet
Personalizing Access
2 pages
Spectrum Protect 8.1 Technical Overview - 4Q16 VE, DP, and Snapshot FINAL
No ratings yet
Spectrum Protect 8.1 Technical Overview - 4Q16 VE, DP, and Snapshot FINAL
33 pages
Store Procedure
No ratings yet
Store Procedure
3 pages
DBMS - Chapter-2
No ratings yet
DBMS - Chapter-2
73 pages
Database Management System: Homework-2
100% (2)
Database Management System: Homework-2
19 pages
It Lab Mysql
No ratings yet
It Lab Mysql
9 pages
[FREE PDF sample] Practical Hive A Guide to Hadoop s Data Warehouse System 1st Edition Scott Shaw ebooks
100% (2)
[FREE PDF sample] Practical Hive A Guide to Hadoop s Data Warehouse System 1st Edition Scott Shaw ebooks
62 pages
NSDMERPLogs - 2023 11 25
No ratings yet
NSDMERPLogs - 2023 11 25
131 pages
Phprunner
80% (5)
Phprunner
1,346 pages
College Management
No ratings yet
College Management
3 pages
IT5013_LP4.7.24
No ratings yet
IT5013_LP4.7.24
2 pages
RSRAN104 - PRACH Propagation Delay-RSRAN-WCEL-week-rsran WCDMA16 Reports RSRAN104 xml-2018 08 28-08 11 16 992
No ratings yet
RSRAN104 - PRACH Propagation Delay-RSRAN-WCEL-week-rsran WCDMA16 Reports RSRAN104 xml-2018 08 28-08 11 16 992
17 pages
Mongoose
No ratings yet
Mongoose
39 pages
Android Database
0% (1)
Android Database
22 pages
Excel Pivot Tables Example File
No ratings yet
Excel Pivot Tables Example File
20 pages
Summer Term 2024
No ratings yet
Summer Term 2024
4 pages
Professional SQL Server 2012 Internals and Troubleshooting 1st Edition Christian Bolton pdf download
100% (1)
Professional SQL Server 2012 Internals and Troubleshooting 1st Edition Christian Bolton pdf download
60 pages

Data Cheat Sheet

Uploaded by

Data Cheat Sheet

Uploaded by

Pandas Reference Sheet

POWERED BY THE SCIENTISTS AT THE DATA INCUBATOR

Loading/exporting a data set Examining the data

df = pd.read_html(path_to_file)—parses HTML to find df[‘State’].unique()—returns unique values for the

df.to_csv(path_to_file)—creates CSV of the data frame df.columns—returns column names

Selecting and filtering Statistical operations

c Washington Olympia 7536000 df.rename(columns={‘Population’=’Pop.’})

col1 col2 col3

col1 col2 col3

col1 col2 col3

Merging data frames

df1.merge(df2, how=method, on=’State’)

State Capital Population State Highest Point

b New York Albany 19540000 y New York Mount Marcy

c Washington Olympia 7536000 z Nebraska Panorama Point

Data frame df1 Data frame df2

2 Washington Olympia 7536000 Mount Rainier how=‘inner’

how=‘left’ State Capital Population Highest Point

State Capital Population Highest Point 0 Texas Austin 28700000 NaN

You might also like