SlideShare a Scribd company logo
PythonForDataScience Cheat Sheet
Pandas Basics
Learn Python for Data Science Interactively at www.DataCamp.com
Pandas
DataCamp
Learn Python for Data Science Interactively
Series
DataFrame
4
7
-5
3
D
C
B
AA one-dimensional labeled array
capable of holding any data type
Index
Index
Columns
A two-dimensional labeled
data structure with columns
of potentially different types
The Pandas library is built on NumPy and provides easy-to-use
data structures and data analysis tools for the Python
programming language.
>>> import pandas as pd
Use the following import convention:
Pandas Data Structures
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
>>> data = {'Country': ['Belgium', 'India', 'Brazil'],
'Capital': ['Brussels', 'New Delhi', 'Brasília'],
'Population': [11190846, 1303171035, 207847528]}
>>> df = pd.DataFrame(data,
columns=['Country', 'Capital', 'Population'])
Selection
>>> s['b'] Get one element
-5
>>> df[1:] Get subset of a DataFrame
Country Capital Population
1 India New Delhi 1303171035
2 Brazil Brasília 207847528
By Position
>>> df.iloc([0],[0]) Select single value by row &
'Belgium' column
>>> df.iat([0],[0])
'Belgium'
By Label
>>> df.loc([0], ['Country']) Select single value by row &
'Belgium' column labels
>>> df.at([0], ['Country'])
'Belgium'
By Label/Position
>>> df.ix[2] Select single row of
Country Brazil subset of rows
Capital Brasília
Population 207847528
>>> df.ix[:,'Capital'] Select a single column of
0 Brussels subset of columns
1 New Delhi
2 Brasília
>>> df.ix[1,'Capital'] Select rows and columns
'New Delhi'
Boolean Indexing
>>> s[~(s > 1)] Series s where value is not >1
>>> s[(s < -1) | (s > 2)] s where value is <-1 or >2
>>> df[df['Population']>1200000000] Use filter to adjust DataFrame
Setting
>>> s['a'] = 6 Set index a of Series s to 6
Applying Functions
>>> f = lambda x: x*2
>>> df.apply(f) Apply function
>>> df.applymap(f) Apply function element-wise
Retrieving Series/DataFrame Information
>>> df.shape (rows,columns)
>>> df.index	 Describe index	
>>> df.columns Describe DataFrame columns
>>> df.info() Info on DataFrame
>>> df.count() Number of non-NA values
Getting
Also see NumPy Arrays
Selecting, Boolean Indexing & Setting Basic Information
Summary
>>> df.sum() Sum of values
>>> df.cumsum() Cummulative sum of values
>>> df.min()/df.max() Minimum/maximum values
>>> df.idmin()/df.idmax() Minimum/Maximum index value
>>> df.describe() Summary statistics
>>> df.mean() Mean of values
>>> df.median() Median of values
Dropping
>>> s.drop(['a', 'c']) Drop values from rows (axis=0)
>>> df.drop('Country', axis=1) Drop values from columns(axis=1)
Data Alignment
>>> s.add(s3, fill_value=0)
a 10.0
b -5.0
c 5.0
d 7.0
>>> s.sub(s3, fill_value=2)
>>> s.div(s3, fill_value=4)
>>> s.mul(s3, fill_value=3)
>>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
>>> s + s3
a 10.0
b NaN
c 5.0
d 7.0
Arithmetic Operations with Fill Methods
Internal Data Alignment
NA values are introduced in the indices that don’t overlap:
You can also do the internal data alignment yourself with
the help of the fill methods:
Sort & Rank
>>> df.sort_index(by='Country') Sort by row or column index
>>> s.order()		 Sort a series by its values
>>> df.rank() Assign ranks to entries
Belgium Brussels
India New Delhi
Brazil Brasília
1
2
3
Country Capital
11190846
1303171035
207847528
Population
I/O
Read and Write to CSV
>>> pd.read_csv('file.csv', header=None, nrows=5)
>>> pd.to_csv('myDataFrame.csv')
Read and Write to Excel
>>> pd.read_excel('file.xlsx')
>>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
Read multiple sheets from the same file
>>> xlsx = pd.ExcelFile('file.xls')
>>> df = pd.read_excel(xlsx, 'Sheet1')
>>> help(pd.Series.loc)
Asking For Help
Read and Write to SQL Query or Database Table
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:')
>>> pd.read_sql("SELECT * FROM my_table;", engine)
>>> pd.read_sql_table('my_table', engine)
>>> pd.read_sql_query("SELECT * FROM my_table;", engine)
>>> pd.to_sql('myDf', engine)
read_sql()is a convenience wrapper around read_sql_table() and
read_sql_query()

More Related Content

What's hot (20)

PDF
Introduction to Pandas and Time Series Analysis [PyCon DE]
Alexander Hendorf
 
PPTX
Data Analysis with Python Pandas
Neeru Mittal
 
PDF
Numpy python cheat_sheet
Nishant Upadhyay
 
PPTX
Pandas
Jyoti shukla
 
PPTX
Introduction to numpy Session 1
Jatin Miglani
 
PPTX
Python pandas Library
Md. Sohag Miah
 
PPTX
Object oriented programming in python
baabtra.com - No. 1 supplier of quality freshers
 
PDF
Python matplotlib cheat_sheet
Nishant Upadhyay
 
PPTX
Python Seaborn Data Visualization
Sourabh Sahu
 
PPTX
Data visualization using R
Ummiya Mohammedi
 
PDF
Python NumPy Tutorial | NumPy Array | Edureka
Edureka!
 
PDF
Pandas
maikroeder
 
PPTX
Introduction to pandas
Piyush rai
 
PPTX
Python OOPs
Binay Kumar Ray
 
PDF
Python programming : List and tuples
Emertxe Information Technologies Pvt Ltd
 
PDF
Python For Data Science Cheat Sheet
Karlijn Willems
 
PDF
Data visualization in Python
Marc Garcia
 
PPTX
Chapter 07 inheritance
Praveen M Jigajinni
 
PDF
Data Analysis and Visualization using Python
Chariza Pladin
 
PDF
Data Visualization in Python
Jagriti Goswami
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Alexander Hendorf
 
Data Analysis with Python Pandas
Neeru Mittal
 
Numpy python cheat_sheet
Nishant Upadhyay
 
Pandas
Jyoti shukla
 
Introduction to numpy Session 1
Jatin Miglani
 
Python pandas Library
Md. Sohag Miah
 
Object oriented programming in python
baabtra.com - No. 1 supplier of quality freshers
 
Python matplotlib cheat_sheet
Nishant Upadhyay
 
Python Seaborn Data Visualization
Sourabh Sahu
 
Data visualization using R
Ummiya Mohammedi
 
Python NumPy Tutorial | NumPy Array | Edureka
Edureka!
 
Pandas
maikroeder
 
Introduction to pandas
Piyush rai
 
Python OOPs
Binay Kumar Ray
 
Python programming : List and tuples
Emertxe Information Technologies Pvt Ltd
 
Python For Data Science Cheat Sheet
Karlijn Willems
 
Data visualization in Python
Marc Garcia
 
Chapter 07 inheritance
Praveen M Jigajinni
 
Data Analysis and Visualization using Python
Chariza Pladin
 
Data Visualization in Python
Jagriti Goswami
 

Viewers also liked (20)

PDF
Scikit-learn Cheatsheet-Python
Dr. Volkan OBAN
 
PDF
Follow up SPARK
Sainu Geanina
 
PPTX
Python for Data Analysis: Chapter 2
智哉 今西
 
PPT
Statistical Test
guestdbf093
 
PPTX
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
PDF
Python for Data Science
Harri Hämäläinen
 
PDF
A+ cheat sheet
abnmi
 
PDF
Linux cheat-sheet
Craig Cannon
 
DOCX
Naive Bayes Example using R
Dr. Volkan OBAN
 
PDF
Python
Vinayak Hegde
 
PDF
Advanced R cheat sheet
Dr. Volkan OBAN
 
PDF
Data Exploration and Visualization with R
Yanchang Zhao
 
PPTX
Practical Data Analysis in Python
Hilary Mason
 
PPTX
Data analysis with pandas
Outreach Digital
 
PDF
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
NUI Galway
 
PDF
Getting started with pandas
maikroeder
 
PDF
Python Cheat Sheet
GlowTouch
 
ODP
Data Analysis in Python
Richard Herrell
 
PDF
Cheat sheets for data scientists
Ajay Ohri
 
PPTX
Python and Data Analysis
Praveen Nair
 
Scikit-learn Cheatsheet-Python
Dr. Volkan OBAN
 
Follow up SPARK
Sainu Geanina
 
Python for Data Analysis: Chapter 2
智哉 今西
 
Statistical Test
guestdbf093
 
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
Python for Data Science
Harri Hämäläinen
 
A+ cheat sheet
abnmi
 
Linux cheat-sheet
Craig Cannon
 
Naive Bayes Example using R
Dr. Volkan OBAN
 
Advanced R cheat sheet
Dr. Volkan OBAN
 
Data Exploration and Visualization with R
Yanchang Zhao
 
Practical Data Analysis in Python
Hilary Mason
 
Data analysis with pandas
Outreach Digital
 
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
NUI Galway
 
Getting started with pandas
maikroeder
 
Python Cheat Sheet
GlowTouch
 
Data Analysis in Python
Richard Herrell
 
Cheat sheets for data scientists
Ajay Ohri
 
Python and Data Analysis
Praveen Nair
 
Ad

Similar to Python Pandas for Data Science cheatsheet (20)

PDF
2 pandasbasic
pramod naik
 
PDF
Pandas pythonfordatascience
Nishant Upadhyay
 
PDF
PyData Paris 2015 - Track 1.2 Gilles Louppe
Pôle Systematic Paris-Region
 
PPTX
pandasppt with informative topics coverage.pptx
vallarasu200364
 
PDF
pandas dataframe notes.pdf
AjeshSurejan2
 
PPTX
dvdxsfdxfdfdfdffddvfbgbesseesesgesesseseggesges
iapreddy2004
 
PDF
Data Analysis with Pandas CheatSheet .pdf
Erwin512140
 
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
PPTX
Pandas Dataframe reading data Kirti final.pptx
Kirti Verma
 
PDF
pandas - Python Data Analysis
Andrew Henshaw
 
PPTX
introduction to data structures in pandas
vidhyapm2
 
PDF
Pandas cheat sheet
Lenis Carolina Lopez
 
PDF
Pandas cheat sheet_data science
Subrata Shaw
 
PDF
Data Wrangling with Pandas
Luis Carrasco
 
PPTX
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
HarshitChauhan88
 
PDF
pandas-221217084954-937bb582.pdf
scorsam1
 
PPTX
Pandas.pptx
Govardhan Bhavani
 
PPTX
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
PPTX
Pandas.pptx
Ramakrishna Reddy Bijjam
 
PPTX
ppanda.pptx
DOLKUMARCHANDRA
 
2 pandasbasic
pramod naik
 
Pandas pythonfordatascience
Nishant Upadhyay
 
PyData Paris 2015 - Track 1.2 Gilles Louppe
Pôle Systematic Paris-Region
 
pandasppt with informative topics coverage.pptx
vallarasu200364
 
pandas dataframe notes.pdf
AjeshSurejan2
 
dvdxsfdxfdfdfdffddvfbgbesseesesgesesseseggesges
iapreddy2004
 
Data Analysis with Pandas CheatSheet .pdf
Erwin512140
 
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
Pandas Dataframe reading data Kirti final.pptx
Kirti Verma
 
pandas - Python Data Analysis
Andrew Henshaw
 
introduction to data structures in pandas
vidhyapm2
 
Pandas cheat sheet
Lenis Carolina Lopez
 
Pandas cheat sheet_data science
Subrata Shaw
 
Data Wrangling with Pandas
Luis Carrasco
 
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
HarshitChauhan88
 
pandas-221217084954-937bb582.pdf
scorsam1
 
Pandas.pptx
Govardhan Bhavani
 
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
ppanda.pptx
DOLKUMARCHANDRA
 
Ad

More from Dr. Volkan OBAN (20)

PDF
Conference Paper:IMAGE PROCESSING AND OBJECT DETECTION APPLICATION: INSURANCE...
Dr. Volkan OBAN
 
PDF
Covid19py Python Package - Example
Dr. Volkan OBAN
 
PDF
Object detection with Python
Dr. Volkan OBAN
 
PDF
Python - Rastgele Orman(Random Forest) Parametreleri
Dr. Volkan OBAN
 
DOCX
Linear Programming wi̇th R - Examples
Dr. Volkan OBAN
 
DOCX
"optrees" package in R and examples.(optrees:finds optimal trees in weighted ...
Dr. Volkan OBAN
 
DOCX
k-means Clustering in Python
Dr. Volkan OBAN
 
DOCX
R forecasting Example
Dr. Volkan OBAN
 
DOCX
k-means Clustering and Custergram with R
Dr. Volkan OBAN
 
PDF
Data Science and its Relationship to Big Data and Data-Driven Decision Making
Dr. Volkan OBAN
 
DOCX
Data Visualization with R.ggplot2 and its extensions examples.
Dr. Volkan OBAN
 
PDF
Pandas,scipy,numpy cheatsheet
Dr. Volkan OBAN
 
PPTX
ReporteRs package in R. forming powerpoint documents-an example
Dr. Volkan OBAN
 
PPTX
ReporteRs package in R. forming powerpoint documents-an example
Dr. Volkan OBAN
 
DOCX
R-ggplot2 package Examples
Dr. Volkan OBAN
 
DOCX
R Machine Learning packages( generally used)
Dr. Volkan OBAN
 
DOCX
treemap package in R and examples.
Dr. Volkan OBAN
 
DOCX
Mosaic plot in R.
Dr. Volkan OBAN
 
DOCX
imager package in R and examples..
Dr. Volkan OBAN
 
PDF
R-Data table Cheat Sheet
Dr. Volkan OBAN
 
Conference Paper:IMAGE PROCESSING AND OBJECT DETECTION APPLICATION: INSURANCE...
Dr. Volkan OBAN
 
Covid19py Python Package - Example
Dr. Volkan OBAN
 
Object detection with Python
Dr. Volkan OBAN
 
Python - Rastgele Orman(Random Forest) Parametreleri
Dr. Volkan OBAN
 
Linear Programming wi̇th R - Examples
Dr. Volkan OBAN
 
"optrees" package in R and examples.(optrees:finds optimal trees in weighted ...
Dr. Volkan OBAN
 
k-means Clustering in Python
Dr. Volkan OBAN
 
R forecasting Example
Dr. Volkan OBAN
 
k-means Clustering and Custergram with R
Dr. Volkan OBAN
 
Data Science and its Relationship to Big Data and Data-Driven Decision Making
Dr. Volkan OBAN
 
Data Visualization with R.ggplot2 and its extensions examples.
Dr. Volkan OBAN
 
Pandas,scipy,numpy cheatsheet
Dr. Volkan OBAN
 
ReporteRs package in R. forming powerpoint documents-an example
Dr. Volkan OBAN
 
ReporteRs package in R. forming powerpoint documents-an example
Dr. Volkan OBAN
 
R-ggplot2 package Examples
Dr. Volkan OBAN
 
R Machine Learning packages( generally used)
Dr. Volkan OBAN
 
treemap package in R and examples.
Dr. Volkan OBAN
 
Mosaic plot in R.
Dr. Volkan OBAN
 
imager package in R and examples..
Dr. Volkan OBAN
 
R-Data table Cheat Sheet
Dr. Volkan OBAN
 

Recently uploaded (20)

DOCX
Udemy - data management Luisetto Mauro.docx
M. Luisetto Pharm.D.Spec. Pharmacology
 
PDF
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
PPTX
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
PPT
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
PDF
SaleServicereport and SaleServicereport
2251330007
 
DOCX
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
DOCX
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
DOCX
Starbucks in the Indian market through its joint venture.
sales480687
 
PPTX
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
PDF
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
DOCX
Cat_Latin_America_in_World_Politics[1].docx
sales480687
 
PDF
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
PPTX
727325165-Unit-1-Data-Analytics-PPT-1.pptx
revathi148366
 
PDF
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
PPTX
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
 
PPTX
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
DOCX
The Influence off Flexible Work Policies
sales480687
 
PPTX
Parental Leave Policies & Research Bulgaria
Elitsa Dimitrova
 
PPTX
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
Udemy - data management Luisetto Mauro.docx
M. Luisetto Pharm.D.Spec. Pharmacology
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
SaleServicereport and SaleServicereport
2251330007
 
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
 
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
Starbucks in the Indian market through its joint venture.
sales480687
 
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
Cat_Latin_America_in_World_Politics[1].docx
sales480687
 
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
 
727325165-Unit-1-Data-Analytics-PPT-1.pptx
revathi148366
 
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
 
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
The Influence off Flexible Work Policies
sales480687
 
Parental Leave Policies & Research Bulgaria
Elitsa Dimitrova
 
PPT2 W1L2.pptx.........................................
palicteronalyn26
 

Python Pandas for Data Science cheatsheet

  • 1. PythonForDataScience Cheat Sheet Pandas Basics Learn Python for Data Science Interactively at www.DataCamp.com Pandas DataCamp Learn Python for Data Science Interactively Series DataFrame 4 7 -5 3 D C B AA one-dimensional labeled array capable of holding any data type Index Index Columns A two-dimensional labeled data structure with columns of potentially different types The Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language. >>> import pandas as pd Use the following import convention: Pandas Data Structures >>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) >>> data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasília'], 'Population': [11190846, 1303171035, 207847528]} >>> df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population']) Selection >>> s['b'] Get one element -5 >>> df[1:] Get subset of a DataFrame Country Capital Population 1 India New Delhi 1303171035 2 Brazil Brasília 207847528 By Position >>> df.iloc([0],[0]) Select single value by row & 'Belgium' column >>> df.iat([0],[0]) 'Belgium' By Label >>> df.loc([0], ['Country']) Select single value by row & 'Belgium' column labels >>> df.at([0], ['Country']) 'Belgium' By Label/Position >>> df.ix[2] Select single row of Country Brazil subset of rows Capital Brasília Population 207847528 >>> df.ix[:,'Capital'] Select a single column of 0 Brussels subset of columns 1 New Delhi 2 Brasília >>> df.ix[1,'Capital'] Select rows and columns 'New Delhi' Boolean Indexing >>> s[~(s > 1)] Series s where value is not >1 >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> df[df['Population']>1200000000] Use filter to adjust DataFrame Setting >>> s['a'] = 6 Set index a of Series s to 6 Applying Functions >>> f = lambda x: x*2 >>> df.apply(f) Apply function >>> df.applymap(f) Apply function element-wise Retrieving Series/DataFrame Information >>> df.shape (rows,columns) >>> df.index Describe index >>> df.columns Describe DataFrame columns >>> df.info() Info on DataFrame >>> df.count() Number of non-NA values Getting Also see NumPy Arrays Selecting, Boolean Indexing & Setting Basic Information Summary >>> df.sum() Sum of values >>> df.cumsum() Cummulative sum of values >>> df.min()/df.max() Minimum/maximum values >>> df.idmin()/df.idmax() Minimum/Maximum index value >>> df.describe() Summary statistics >>> df.mean() Mean of values >>> df.median() Median of values Dropping >>> s.drop(['a', 'c']) Drop values from rows (axis=0) >>> df.drop('Country', axis=1) Drop values from columns(axis=1) Data Alignment >>> s.add(s3, fill_value=0) a 10.0 b -5.0 c 5.0 d 7.0 >>> s.sub(s3, fill_value=2) >>> s.div(s3, fill_value=4) >>> s.mul(s3, fill_value=3) >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd']) >>> s + s3 a 10.0 b NaN c 5.0 d 7.0 Arithmetic Operations with Fill Methods Internal Data Alignment NA values are introduced in the indices that don’t overlap: You can also do the internal data alignment yourself with the help of the fill methods: Sort & Rank >>> df.sort_index(by='Country') Sort by row or column index >>> s.order() Sort a series by its values >>> df.rank() Assign ranks to entries Belgium Brussels India New Delhi Brazil Brasília 1 2 3 Country Capital 11190846 1303171035 207847528 Population I/O Read and Write to CSV >>> pd.read_csv('file.csv', header=None, nrows=5) >>> pd.to_csv('myDataFrame.csv') Read and Write to Excel >>> pd.read_excel('file.xlsx') >>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') Read multiple sheets from the same file >>> xlsx = pd.ExcelFile('file.xls') >>> df = pd.read_excel(xlsx, 'Sheet1') >>> help(pd.Series.loc) Asking For Help Read and Write to SQL Query or Database Table >>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///:memory:') >>> pd.read_sql("SELECT * FROM my_table;", engine) >>> pd.read_sql_table('my_table', engine) >>> pd.read_sql_query("SELECT * FROM my_table;", engine) >>> pd.to_sql('myDf', engine) read_sql()is a convenience wrapper around read_sql_table() and read_sql_query()