0% found this document useful (1 vote)

105 views3 pages

Python Data Science Cheat Sheet

Uploaded by

Sebastián Emdef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

105 views3 pages

Python Data Science Cheat Sheet

Uploaded by

Sebastián Emdef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PYTHON CHEAT SHEET FOR DATA SCIENCE

Importing Data
Any kind of data analysis starts with getting hold of some data. Pandas gives you plenty of options for
getting data into your Python workbook:

In [ ]: pd.read_csv(filename) # From a CSV file

pd.read_table(filename) # From a delimited text file (like TSV)
pd.read_excel(filename) # From an Excel file
pd.read_sql(query, connection_object) # Reads from a SQL table/database
pd.read_json(json_string) # Reads from a JSON formatted string, URL or file.
pd.read_html(url) # Parses an html URL, string or file and extracts tables to a list of datafra
pd.read_clipboard() # Takes the contents of your clipboard and passes it to read_table()
pd.DataFrame(dict) # From a dict, keys for columns names, values for data as lists

Exploring Data
Once you have imported your data into a Pandas dataframe, you can use these methods to get a sense of
what the data looks like:

In [ ]: df.shape() # Prints number of rows and columns in dataframe

df.head(n) # Prints first n rows of the DataFrame
df.tail(n) # Prints last n rows of the DataFrame
df.info() # Index, Datatype and Memory information
df.describe() # Summary statistics for numerical columns
s.value_counts(dropna=False) # Views unique values and counts
df.apply(pd.Series.value_counts) # Unique values and counts for all columns
df.describe() # Summary statistics for numerical columns
df.mean() # Returns the mean of all columns
df.corr() # Returns the correlation between columns in a DataFrame
df.count() # Returns the number of non-null values in each DataFrame column
df.max() # Returns the highest value in each column
df.min() # Returns the lowest value in each column
df.median() # Returns the median of each column
df.std() # Returns the standard deviation of each column

Selecting
Often, you might need to select a single element or a certain subset of the data to inspect it or perform
further analysis. These methods will come in handy:

In [ ]: df[col] # Returns column with label col as Series

df[[col1, col2]] # Returns Columns as a new DataFrame
s.iloc[0] # Selection by position (selects first element)
s.loc[0] # Selection by index (selects element at index 0)
df.iloc[0,:] # First row
df.iloc[0,0] # First element of first column

Data Cleaning
If you’re working with real world data, chances are you’ll need to clean it up. These are some helpful
methods:

In [ ]: df.columns = ['a','b','c'] # Renames columns

pd.isnull() # Checks for null Values, Returns Boolean Array
pd.notnull() # Opposite of s.isnull()
df.dropna() # Drops all rows that contain null values
df.dropna(axis=1) # Drops all columns that contain null values
df.dropna(axis=1,thresh=n) # Drops all rows have have less than n non null values
df.fillna(x) # Replaces all null values with x
s.fillna(s.mean()) # Replaces all null values with the mean (mean can be replaced with almost a
s.astype(float) # Converts the datatype of the series to float
s.replace(1,'one') # Replaces all values equal to 1 with 'one'
s.replace([1,3],['one','three']) # Replaces all 1 with 'one' and 3 with 'three'
df.rename(columns=lambda x: x + 1) # Mass renaming of columns
df.rename(columns={'old_name': 'new_ name'}) # Selective renaming
df.set_index('column_one') # Changes the index
df.rename(index=lambda x: x + 1) # Mass renaming of index

Filter, Sort and Group By

Methods for filtering, sorting and grouping your data:

In [ ]: df[df[col] > 0.5] # Rows where the col column is greater than 0.5
df[(df[col] > 0.5) & (df[col] < 0.7)] # Rows where 0.5 < col < 0.7
df.sort_values(col1) # Sorts values by col1 in ascending order
df.sort_values(col2,ascending=False) # Sorts values by col2 in descending order
df.sort_values([col1,col2], ascending=[True,False]) # Sorts values by col1 in ascending order t
df.groupby(col) # Returns a groupby object for values from one column
df.groupby([col1,col2]) # Returns a groupby object values from multiple columns
df.groupby(col1)[col2].mean() # Returns the mean of the values in col2, grouped by the values i
df.pivot_table(index=col1, values= col2,col3], aggfunc=mean) # Creates a pivot table that group
df.groupby(col1).agg(np.mean) # Finds the average across all columns for every unique column 1
df.apply(np.mean) # Applies a function across each column
df.apply(np.max, axis=1) # Applies a function across each row

Joining and Combining

Methods for combining two dataframes:

In [ ]: df1.append(df2) # Adds the rows in df1 to the end of df2 (columns should be identical)
pd.concat([df1, df2],axis=1) # Adds the columns in df1 to the end of df2 (rows should be identi
df1.join(df2,on=col1,how='inner') # SQL-style joins the columns in df1 with the columns on df2

Writing Data
And finally, when you have produced results with your analysis, there are several ways you can export your
data:

In [ ]: df.to_csv(filename) # Writes to a CSV file

df.to_excel(filename) # Writes to an Excel file
df.to_sql(table_name, connection_object) # Writes to a SQL table
df.to_json(filename) # Writes to a file in JSON format
df.to_html(filename) # Saves as an HTML table
df.to_clipboard() # Writes to the clipboard

Machine Learning
The Scikit-Learn library contains useful methods for training and applying machine learning models. Our
Scikit-Learn tutorial provides more context for the code below.

For a complete list of the Supervised Learning, Unsupervised Learning, and Dataset Transformation, and
Model Evaluation modules in Scikit-Learn, please refer to its user guide.

In [ ]: # Import libraries and modules

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.externals import joblib

# Load red wine data.

dataset_url = 'http://mlr.cs.umass.edu/ml/machine-learning-databases/wine-quality/winequality-r
data = pd.read_csv(dataset_url, sep=';')

# Split data into training and test sets

y = data.quality
X = data.drop('quality', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=123,
stratify=y)

# Declare data preprocessing steps

pipeline = make_pipeline(preprocessing.StandardScaler(),
RandomForestRegressor(n_estimators=100))

# Declare hyperparameters to tune

hyperparameters = { 'randomforestregressor__max_features' : ['auto', 'sqrt', 'log2'],
'randomforestregressor__max_depth': [None, 5, 3, 1]}

# Tune model using cross-validation pipeline

clf = GridSearchCV(pipeline, hyperparameters, cv=10)

clf.fit(X_train, y_train)

# Refit on the entire training set

# No additional code needed if clf.refit == True (default is True)

# Evaluate model pipeline on test data

pred = clf.predict(X_test)
print r2_score(y_test, pred)
print mean_squared_error(y_test, pred)

# Save model for future use

joblib.dump(clf, 'rf_regressor.pkl')
# To load: clf2 = joblib.load('rf_regressor.pkl')

Conclusion
We’ve barely scratching the surface in terms of what you can do with Python and data science, but we hope
this cheatsheet has given you a taste of what you can do!

This post was kindly provided by our friend Kara Tan. Kara is a cofounder of Altitude Labs, a full-service app
design and development agency that specializes in data driven design and personalization.

Python Data Science Cheat Sheet
100% (2)
Python Data Science Cheat Sheet
6 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
5 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Python Data Cleaning Cheat Sheet
100% (4)
Python Data Cleaning Cheat Sheet
8 pages
Pandas For Machine Learning
No ratings yet
Pandas For Machine Learning
10 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
# (Data Preprocessing) : (Cheatsheet)
No ratings yet
# (Data Preprocessing) : (Cheatsheet)
10 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
12 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Pandas Trampas
No ratings yet
Pandas Trampas
9 pages
Pandas Dataframe Cheat Sheet
No ratings yet
Pandas Dataframe Cheat Sheet
3 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
No ratings yet
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
10 pages
Essential Pandas Cheat Sheet Guide
No ratings yet
Essential Pandas Cheat Sheet Guide
5 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas Library
No ratings yet
Pandas Library
6 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Intro Pandas
No ratings yet
Intro Pandas
18 pages
Ap Python
No ratings yet
Ap Python
12 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Cheat Sheet
No ratings yet
Cheat Sheet
12 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
11 pages
Essential Pandas DataFrame Guide
No ratings yet
Essential Pandas DataFrame Guide
9 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Imp Pandas Cheatsheet
No ratings yet
Imp Pandas Cheatsheet
11 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
6 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Data Analysis W Pandas
No ratings yet
Data Analysis W Pandas
4 pages
Python Pandas Tutorial
96% (28)
Python Pandas Tutorial
178 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (19)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
The Python Bible
97% (33)
The Python Bible
506 pages
Python Cookbook
100% (9)
Python Cookbook
477 pages
Python Notes For Professionals
100% (18)
Python Notes For Professionals
814 pages
Python Data Science Cheat Sheet
97% (33)
Python Data Science Cheat Sheet
11 pages
Python Programming. A Step-by-Step Guide For Absolute Beginners
91% (46)
Python Programming. A Step-by-Step Guide For Absolute Beginners
181 pages
Python 3 Cheat Sheet
94% (51)
Python 3 Cheat Sheet
2 pages
Python Programming & SQL
100% (4)
Python Programming & SQL
152 pages
Beginners Python Cheat Sheet
89% (9)
Beginners Python Cheat Sheet
28 pages
Python Handwritten Notes (Original Images)
96% (24)
Python Handwritten Notes (Original Images)
186 pages
Coffee Break NumPy PDF
100% (8)
Coffee Break NumPy PDF
211 pages
Let Us Python by Yashavant Kanetkar
89% (28)
Let Us Python by Yashavant Kanetkar
429 pages
Python Full Notes - Working
100% (5)
Python Full Notes - Working
645 pages
Python Basics for Beginners
100% (4)
Python Basics for Beginners
26 pages
Python in Excel (2024)
100% (14)
Python in Excel (2024)
607 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
Data Visualization in Python Preview PDF
100% (9)
Data Visualization in Python Preview PDF
58 pages
Python 3 Cheat Sheet v3
100% (5)
Python 3 Cheat Sheet v3
13 pages
Python Cheat Sheet
100% (4)
Python Cheat Sheet
14 pages
Core Python Cheat Sheet
100% (4)
Core Python Cheat Sheet
9 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
96% (23)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Learn Python Visually
100% (10)
Learn Python Visually
134 pages
(Hunt, J.) A Beginners Guide To Python 3 Programming
96% (47)
(Hunt, J.) A Beginners Guide To Python 3 Programming
440 pages
Largest of Three Numbers in Python
100% (1)
Largest of Three Numbers in Python
145 pages
NumPy Basics Cheat Sheet for Python
100% (5)
NumPy Basics Cheat Sheet for Python
14 pages
Bcac501 - It - Part-1-Html
No ratings yet
Bcac501 - It - Part-1-Html
117 pages
100 Skills To Better Python
100% (10)
100 Skills To Better Python
80 pages
Manual Phyton
100% (5)
Manual Phyton
115 pages
Python Cheat Sheet: Mosh Hamedani
100% (8)
Python Cheat Sheet: Mosh Hamedani
14 pages
MySQL CHEAT SHEET
No ratings yet
MySQL CHEAT SHEET
20 pages
40 Advanced Useful VBA Codes For Excel
No ratings yet
40 Advanced Useful VBA Codes For Excel
29 pages
YFinance Library: A Comprehensive Guide
No ratings yet
YFinance Library: A Comprehensive Guide
13 pages
Data Visualization and Exploratory Analysis
No ratings yet
Data Visualization and Exploratory Analysis
23 pages
Data Cleaning (Examples)
No ratings yet
Data Cleaning (Examples)
9 pages
Python Data Wrangling for Crypto Analysis
No ratings yet
Python Data Wrangling for Crypto Analysis
15 pages
Python ML Tutorial: Scikit-Learn Wine Quality
No ratings yet
Python ML Tutorial: Scikit-Learn Wine Quality
16 pages
Python For Data Science Quickstart Guide
No ratings yet
Python For Data Science Quickstart Guide
13 pages
Krishna
No ratings yet
Krishna
24 pages
ts14 05 Morebodi
No ratings yet
ts14 05 Morebodi
12 pages
Reusme
No ratings yet
Reusme
4 pages
System Pulse at Schedule Time Pulse at Schedule Time
No ratings yet
System Pulse at Schedule Time Pulse at Schedule Time
2 pages
Irs Unit III
No ratings yet
Irs Unit III
74 pages
Siva Krishna Profile
No ratings yet
Siva Krishna Profile
7 pages
Course Outline DCIT 305 For 2425
No ratings yet
Course Outline DCIT 305 For 2425
3 pages
Python Py
No ratings yet
Python Py
19 pages
Machine Learning Techniques Syllabus
No ratings yet
Machine Learning Techniques Syllabus
13 pages
GAIT
No ratings yet
GAIT
30 pages
Custom Reports Design Manual: Micros
No ratings yet
Custom Reports Design Manual: Micros
58 pages
DA-100 Exam Prep
100% (2)
DA-100 Exam Prep
158 pages
SQL Queries for Library Database
No ratings yet
SQL Queries for Library Database
7 pages
Continuous Delivery Essentials
No ratings yet
Continuous Delivery Essentials
7 pages
SQLZOO
No ratings yet
SQLZOO
2 pages
Data With JPA - Spring Data and JPA Cheatsheet - Codecademy
No ratings yet
Data With JPA - Spring Data and JPA Cheatsheet - Codecademy
3 pages
Tableau Desktop
No ratings yet
Tableau Desktop
3,587 pages
1) GCP 101
No ratings yet
1) GCP 101
102 pages
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
No ratings yet
IE494 - Big - Data - Processing - Course - File - Autumn24 - PMJ - PM Jat
5 pages
Cloud Analytics for Data Experts
No ratings yet
Cloud Analytics for Data Experts
2 pages
Ogr2ogr - A Simple Command Line Tool To Transform Your GIS Data. - Spatial Dev Guru
No ratings yet
Ogr2ogr - A Simple Command Line Tool To Transform Your GIS Data. - Spatial Dev Guru
5 pages
GIS Seminar Report Overview
100% (2)
GIS Seminar Report Overview
35 pages
Vector Clocks
No ratings yet
Vector Clocks
4 pages
Data Analytics: Foundations & Challenges
No ratings yet
Data Analytics: Foundations & Challenges
3 pages
How To Create A Progressive Web App (PWA) Using Next - Js
No ratings yet
How To Create A Progressive Web App (PWA) Using Next - Js
28 pages
Module 08 Access & Use Database Application Frew
100% (1)
Module 08 Access & Use Database Application Frew
59 pages
Application Software Notes
100% (3)
Application Software Notes
2 pages
Lecture 6 - Inferential Statistics With Two Samples
No ratings yet
Lecture 6 - Inferential Statistics With Two Samples
27 pages
Yashwanth Resume (Azure)
No ratings yet
Yashwanth Resume (Azure)
7 pages
Java Full Stack Internship Report
No ratings yet
Java Full Stack Internship Report
43 pages

Python Data Science Cheat Sheet

Uploaded by

Python Data Science Cheat Sheet

Uploaded by

PYTHON CHEAT SHEET FOR DATA SCIENCE

In [ ]: pd.read_csv(filename) # From a CSV file

In [ ]: df.shape() # Prints number of rows and columns in dataframe

In [ ]: df[col] # Returns column with label col as Series

In [ ]: df.columns = ['a','b','c'] # Renames columns

Filter, Sort and Group By

Joining and Combining

In [ ]: df.to_csv(filename) # Writes to a CSV file

In [ ]: # Import libraries and modules

from sklearn.model_selection import train_test_split

# Load red wine data.

# Split data into training and test sets

# Declare data preprocessing steps

# Declare hyperparameters to tune

# Tune model using cross-validation pipeline

# Refit on the entire training set

# Evaluate model pipeline on test data

# Save model for future use

You might also like