100% found this document useful (2 votes)

437 views6 pages

Python For DS Cheat Sheet

This document provides a cheat sheet summarizing common functionality from popular Python libraries for data science like Pandas, Numpy, and Scikit-Learn. It covers topics such as importing and exploring data, cleaning data, selecting subsets, filtering and grouping data, machine learning techniques, and more. Code snippets are provided as examples for how to use key methods and functions from these libraries for common data science tasks.

Uploaded by

Sebastián Emdef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

437 views6 pages

Python For DS Cheat Sheet

Uploaded by

Sebastián Emdef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Python Cheat Sheet for Data Science

elitedatascience.com/python-cheat-sheet

November 28, 2017

Pandas, Numpy, and Scikit-Learn are among the most popular libraries for data science
and analysis with Python.

Numpy is used for lower level scientific computation. Pandas is built on top of Numpy and
designed for practical data analysis in Python. Scikit-Learn comes with many machine
learning models that you can use out of the box.

In this cheat sheet, we’ll summarize some of the most common and useful functionality
from these libraries. Let’s jump straight in!

Importing Data
Any kind of data analysis starts with getting hold of some data. Pandas gives you plenty
of options for getting data into your Python workbook:

Importing Data
Python

1/6
1 pd.read_csv(filename) # From a CSV file
2 pd.read_table(filename) # From a delimited text file (like TSV)
3 pd.read_excel(filename) # From an Excel file
4 pd.read_sql(query, connection_object) # Reads from a SQL table/database
5 pd.read_json(json_string) # Reads from a JSON formatted string, URL or file.
6 pd.read_html(url) # Parses an html URL, string or file and extracts tables to a list of
7 dataframes
8 pd.read_clipboard() # Takes the contents of your clipboard and passes it to
read_table()
pd.DataFrame(dict) # From a dict, keys for columns names, values for data as lists

Exploring Data
Once you have imported your data into a Pandas dataframe, you can use these methods
to get a sense of what the data looks like:

Exploring Data
Python

1 df.shape() # Prints number of rows and columns in dataframe

2 df.head(n) # Prints first n rows of the DataFrame
3 df.tail(n) # Prints last n rows of the DataFrame
4 df.info() # Index, Datatype and Memory information
5 df.describe() # Summary statistics for numerical columns
6 s.value_counts(dropna=False) # Views unique values and counts
7 df.apply(pd.Series.value_counts) # Unique values and counts for all columns
8 df.describe() # Summary statistics for numerical columns
9 df.mean() # Returns the mean of all columns
10 df.corr() # Returns the correlation between columns in a DataFrame
11 df.count() # Returns the number of non-null values in each DataFrame column
12 df.max() # Returns the highest value in each column
13 df.min() # Returns the lowest value in each column
14 df.median() # Returns the median of each column
15 df.std() # Returns the standard deviation of each column

Selecting
Often, you might need to select a single element or a certain subset of the data to inspect
it or perform further analysis. These methods will come in handy:

Selecting Data
Python

1 df[col] # Returns column with label col as Series

2 df[[col1, col2]] # Returns Columns as a new DataFrame
3 s.iloc[0] # Selection by position (selects first element)
4 s.loc[0] # Selection by index (selects element at index 0)
5 df.iloc[0,:] # First row
6 df.iloc[0,0] # First element of first column

2/6
Data Cleaning
If you’re working with real world data, chances are you’ll need to clean it up. These are
some helpful methods:

Data Cleaning
Python

1 df.columns = ['a','b','c'] # Renames columns

2 pd.isnull() # Checks for null Values, Returns Boolean Array
3 pd.notnull() # Opposite of s.isnull()
4 df.dropna() # Drops all rows that contain null values
5 df.dropna(axis=1) # Drops all columns that contain null values
6 df.dropna(axis=1,thresh=n) # Drops all rows have have less than n non null values
7 df.fillna(x) # Replaces all null values with x
8 s.fillna(s.mean()) # Replaces all null values with the mean (mean can be replaced
9 with almost any function from the statistics section)
10 s.astype(float) # Converts the datatype of the series to float
11 s.replace(1,'one') # Replaces all values equal to 1 with 'one'
12 s.replace([1,3],['one','three']) # Replaces all 1 with 'one' and 3 with 'three'
13 df.rename(columns=lambda x: x + 1) # Mass renaming of columns
14 df.rename(columns={'old_name': 'new_ name'}) # Selective renaming
15 df.set_index('column_one') # Changes the index
df.rename(index=lambda x: x + 1) # Mass renaming of index

Filter, Sort and Group By

Methods for filtering, sorting and grouping your data:

Filter, Sort, and Group By

Python

1 df[df[col] > 0.5] # Rows where the col column is greater than 0.5
2 df[(df[col] > 0.5) & (df[col] < 0.7)] # Rows where 0.5 < col < 0.7
3 df.sort_values(col1) # Sorts values by col1 in ascending order
4 df.sort_values(col2,ascending=False) # Sorts values by col2 in descending order
5 df.sort_values([col1,col2], ascending=[True,False]) # Sorts values by col1 in
6 ascending order then col2 in descending order
7 df.groupby(col) # Returns a groupby object for values from one column
8 df.groupby([col1,col2]) # Returns a groupby object values from multiple columns
9 df.groupby(col1)[col2].mean() # Returns the mean of the values in col2, grouped
10 by the values in col1 (mean can be replaced with almost any function from the
11 statistics section)
12 df.pivot_table(index=col1, values= col2,col3], aggfunc=mean) # Creates a pivot
table that groups by col1 and calculates the mean of col2 and col3
df.groupby(col1).agg(np.mean) # Finds the average across all columns for every
unique column 1 group
df.apply(np.mean) # Applies a function across each column
df.apply(np.max, axis=1) # Applies a function across each row

Joining and Combining

3/6
Methods for combining two dataframes:

Joining and Combining

Python

1 df1.append(df2) # Adds the rows in df1 to the end of df2 (columns should be
2 identical)
3 pd.concat([df1, df2],axis=1) # Adds the columns in df1 to the end of df2 (rows
should be identical)
df1.join(df2,on=col1,how='inner') # SQL-style joins the columns in df1 with the
columns on df2 where the rows for col have identical values. how can be one of
'left', 'right', 'outer', 'inner'<strong> </strong>

Writing Data
And finally, when you have produced results with your analysis, there are several ways
you can export your data:

Writing Data
Python

1 df.to_csv(filename) # Writes to a CSV file

2 df.to_excel(filename) # Writes to an Excel file
3 df.to_sql(table_name, connection_object) # Writes to a SQL table
4 df.to_json(filename) # Writes to a file in JSON format
5 df.to_html(filename) # Saves as an HTML table
6 df.to_clipboard() # Writes to the clipboard

Machine Learning
The Scikit-Learn library contains useful methods for training and applying machine
learning models. Our Scikit-Learn tutorial provides more context for the code below.

For a complete list of the Supervised Learning, Unsupervised Learning, and Dataset
Transformation, and Model Evaluation modules in Scikit-Learn, please refer to its user
guide.

Machine Learning
Python

4/6
1 # Import libraries and modules
2 import numpy as np
3 import pandas as pd
4 from sklearn.model_selection import train_test_split
5 from sklearn import preprocessing
6 from sklearn.ensemble import RandomForestRegressor
7 from sklearn.pipeline import make_pipeline
8 from sklearn.model_selection import GridSearchCV
9 from sklearn.metrics import mean_squared_error, r2_score
10 from sklearn.externals import joblib
11 # Load red wine data.
12 dataset_url = 'http://mlr.cs.umass.edu/ml/machine-learning-databases/wine-
13 quality/winequality-red.csv'
14 data = pd.read_csv(dataset_url, sep=';')
15 # Split data into training and test sets
16 y = data.quality
17 X = data.drop('quality', axis=1)
18 X_train, X_test, y_train, y_test = train_test_split(X, y,
19 test_size=0.2,
20 random_state=123,
21 stratify=y)
22 # Declare data preprocessing steps
23 pipeline = make_pipeline(preprocessing.StandardScaler(),
24 RandomForestRegressor(n_estimators=100))
25 # Declare hyperparameters to tune
26 hyperparameters = { 'randomforestregressor__max_features' : ['auto', 'sqrt', 'log2'],
27 'randomforestregressor__max_depth': [None, 5, 3, 1]}
28 # Tune model using cross-validation pipeline
29 clf = GridSearchCV(pipeline, hyperparameters, cv=10)
30 clf.fit(X_train, y_train)
31 # Refit on the entire training set
32 # No additional code needed if clf.refit == True (default is True)
33 # Evaluate model pipeline on test data
34 pred = clf.predict(X_test)
35 print r2_score(y_test, pred)
36 print mean_squared_error(y_test, pred)
37 # Save model for future use
38 joblib.dump(clf, 'rf_regressor.pkl')
39 # To load: clf2 = joblib.load('rf_regressor.pkl')
40
41
42
43
44
45
46
47
48

Conclusion
We’ve barely scratching the surface in terms of what you can do with Python and data
science, but we hope this cheatsheet has given you a taste of what you can do!

5/6
This post was kindly provided by our friend Kara Tan. Kara is a cofounder of Altitude
Labs, a full-service app design and development agency that specializes in data driven
design and personalization.

Additional Resources:

6/6

TAMING PYTHON by PROGRAMMING
20% (10)
TAMING PYTHON by PROGRAMMING
15 pages
Python For Data Science Cheat Sheet 2.0
100% (1)
Python For Data Science Cheat Sheet 2.0
11 pages
SH-2 API User's Guide: Generated by Doxygen 1.8.11
No ratings yet
SH-2 API User's Guide: Generated by Doxygen 1.8.11
69 pages
Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Python Django Developer Resume: Career Goal
100% (1)
Python Django Developer Resume: Career Goal
2 pages
CSE340 Summer 2016 Project 2: Parsing: 1. Lexical Specification
No ratings yet
CSE340 Summer 2016 Project 2: Parsing: 1. Lexical Specification
9 pages
The 8 Basic Statistics Concepts For Data Science - KDnuggets
No ratings yet
The 8 Basic Statistics Concepts For Data Science - KDnuggets
13 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Build Share: Docker Cheat Sheet
No ratings yet
Build Share: Docker Cheat Sheet
1 page
7 Data Science / Machine Learning Cheat Sheets in One
100% (1)
7 Data Science / Machine Learning Cheat Sheets in One
9 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Python DataScience PDF
100% (1)
Python DataScience PDF
9 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
DAX Cheat Sheet
No ratings yet
DAX Cheat Sheet
10 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
StatisticsMachineLearningPythonDraft PDF
100% (1)
StatisticsMachineLearningPythonDraft PDF
219 pages
Scikit Learn Cheat Sheet
No ratings yet
Scikit Learn Cheat Sheet
9 pages
Advanced Python Tips
No ratings yet
Advanced Python Tips
50 pages
Python Interview Questions
No ratings yet
Python Interview Questions
61 pages
ML Algorithms
100% (1)
ML Algorithms
1 page
Numpy Python Cheat Sheet
100% (1)
Numpy Python Cheat Sheet
1 page
Python Cheat Sheet 1659967085
100% (1)
Python Cheat Sheet 1659967085
60 pages
Variable Assignment - Python PDF
No ratings yet
Variable Assignment - Python PDF
1 page
12 Useful Pandas Techniques in Python For Data Manipulation
100% (2)
12 Useful Pandas Techniques in Python For Data Manipulation
19 pages
Boolean Algebra: Name Reads As Logic Gate OCR Notation Alternative Notation Examples Truth Table Notes
No ratings yet
Boolean Algebra: Name Reads As Logic Gate OCR Notation Alternative Notation Examples Truth Table Notes
2 pages
(Numpy) - Extended Cheatsheet
No ratings yet
(Numpy) - Extended Cheatsheet
8 pages
Cheat Sheet - Machine Learning - Data Science Interview PDF
No ratings yet
Cheat Sheet - Machine Learning - Data Science Interview PDF
16 pages
Advanced Python
100% (2)
Advanced Python
4 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
16 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
1 page
Python DataScience Cheat-Sheet
100% (1)
Python DataScience Cheat-Sheet
7 pages
Python Cheat Sheet For Data Scientists by Tomi Mester 2019 PDF
100% (3)
Python Cheat Sheet For Data Scientists by Tomi Mester 2019 PDF
23 pages
Python Pandas Tutorial
No ratings yet
Python Pandas Tutorial
45 pages
ML Glossary
No ratings yet
ML Glossary
44 pages
PostgreSQL 9 High Availability Cookbook
From Everand
PostgreSQL 9 High Availability Cookbook
Shaun M. Thomas
5/5 (2)
M3R5 Python Notes
No ratings yet
M3R5 Python Notes
142 pages
Java Cheat Sheet
No ratings yet
Java Cheat Sheet
3 pages
Numpy Cheat Sheet & Quick Reference
100% (1)
Numpy Cheat Sheet & Quick Reference
6 pages
Python Numpy Tutorial
No ratings yet
Python Numpy Tutorial
26 pages
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
100% (1)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
12 pages
Python Full Notes
No ratings yet
Python Full Notes
69 pages
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
No ratings yet
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
15 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Python BeautifulSoup - Parse HTML, XML Documents in Python
100% (1)
Python BeautifulSoup - Parse HTML, XML Documents in Python
21 pages
Pandas Python For Data Science
No ratings yet
Pandas Python For Data Science
1 page
Xgboost PDF
100% (1)
Xgboost PDF
128 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Tensor Flow 2
No ratings yet
Tensor Flow 2
3 pages
Learning Path Machine Learning
No ratings yet
Learning Path Machine Learning
7 pages
Python Cheat Sheet PDF
No ratings yet
Python Cheat Sheet PDF
26 pages
Python - Follow Dr. AngShu (@drangshu) For More
100% (1)
Python - Follow Dr. AngShu (@drangshu) For More
300 pages
Data Analysis Using Python (Python For Beginners) - CloudxLab
No ratings yet
Data Analysis Using Python (Python For Beginners) - CloudxLab
152 pages
Lesson 5 Data Wrangling in Data Science.
100% (1)
Lesson 5 Data Wrangling in Data Science.
11 pages
Data Mining With Py Draft PDF
No ratings yet
Data Mining With Py Draft PDF
103 pages
Numpy Ref
No ratings yet
Numpy Ref
1,128 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
MySQL CHEAT SHEET
No ratings yet
MySQL CHEAT SHEET
20 pages
40 Advanced Useful VBA Codes For Excel
No ratings yet
40 Advanced Useful VBA Codes For Excel
29 pages
Data Cleaning (Examples)
No ratings yet
Data Cleaning (Examples)
9 pages
Data Visualization and Exploratory Analysis
No ratings yet
Data Visualization and Exploratory Analysis
23 pages
QMR - Ai-Yfinance Library The Definitive Guide
No ratings yet
QMR - Ai-Yfinance Library The Definitive Guide
13 pages
Python Data Wrangling Tutorial With Pandas
No ratings yet
Python Data Wrangling Tutorial With Pandas
15 pages
Python Machine Learning Tutorial With Scikit-Learn
No ratings yet
Python Machine Learning Tutorial With Scikit-Learn
16 pages
Python For Data Science Quickstart Guide
No ratings yet
Python For Data Science Quickstart Guide
13 pages
Notation (Pseudo Code, Flow Chart)
No ratings yet
Notation (Pseudo Code, Flow Chart)
51 pages
Pcep 30 02 - 5
No ratings yet
Pcep 30 02 - 5
11 pages
Software Engineering Unit 1
No ratings yet
Software Engineering Unit 1
3 pages
Online MATLABSimulink
No ratings yet
Online MATLABSimulink
4 pages
Database Model and Database Types (12M)
No ratings yet
Database Model and Database Types (12M)
81 pages
Aarya SD1
No ratings yet
Aarya SD1
1 page
Lec 12
No ratings yet
Lec 12
18 pages
PART - A (Short Answer Questions) : S. No Questions Blooms Taxonomy Level Course Outcome
No ratings yet
PART - A (Short Answer Questions) : S. No Questions Blooms Taxonomy Level Course Outcome
6 pages
Ecommerce and Web Security (Cber 705) : Assignment # 1
No ratings yet
Ecommerce and Web Security (Cber 705) : Assignment # 1
5 pages
Test Partner
No ratings yet
Test Partner
2 pages
Feature Driven Development (FDD)
No ratings yet
Feature Driven Development (FDD)
12 pages
Internal Tables Exercises
No ratings yet
Internal Tables Exercises
10 pages
Prashanth Dollu: Page 1 of 4
No ratings yet
Prashanth Dollu: Page 1 of 4
4 pages
Ipc in Solaris
No ratings yet
Ipc in Solaris
24 pages
AgileDoc Standard Report Template Library For DeltaV
No ratings yet
AgileDoc Standard Report Template Library For DeltaV
2 pages
Syllabus of MCA - MGT - 2020 Pattern
No ratings yet
Syllabus of MCA - MGT - 2020 Pattern
63 pages
Ireb Cpre Glossary 15
No ratings yet
Ireb Cpre Glossary 15
116 pages
Scrum Methodology
No ratings yet
Scrum Methodology
9 pages
Functional Spec Material Ageing Report
33% (3)
Functional Spec Material Ageing Report
8 pages
Pet Adoption
No ratings yet
Pet Adoption
101 pages
Scikit Learn
No ratings yet
Scikit Learn
4 pages
Project Report by Akhil Rangpariya
No ratings yet
Project Report by Akhil Rangpariya
44 pages
Installation and Administration Guide SEP14.3RU6
No ratings yet
Installation and Administration Guide SEP14.3RU6
697 pages
Resume Kartik Joshi
No ratings yet
Resume Kartik Joshi
2 pages
BCA Syllabus
No ratings yet
BCA Syllabus
10 pages
Unit 6: Java GUI
No ratings yet
Unit 6: Java GUI
28 pages

Python For DS Cheat Sheet

Uploaded by

Python For DS Cheat Sheet

Uploaded by

Python Cheat Sheet for Data Science

November 28, 2017

1 df.shape() # Prints number of rows and columns in dataframe

1 df[col] # Returns column with label col as Series

1 df.columns = ['a','b','c'] # Renames columns

Filter, Sort and Group By

Filter, Sort, and Group By

Joining and Combining

Joining and Combining

1 df.to_csv(filename) # Writes to a CSV file

You might also like