SlideShare a Scribd company logo
PythonForDataScience Cheat Sheet
Pandas Basics
Learn Python for Data Science Interactively at www.DataCamp.com
Pandas
DataCamp
Learn Python for Data Science Interactively
Series
DataFrame
4
7
-5
3
d
c
b
aA one-dimensional labeled array
capable of holding any data type
Index
Index
Columns
A two-dimensional labeled
data structure with columns
of potentially different types
The Pandas library is built on NumPy and provides easy-to-use
data structures and data analysis tools for the Python
programming language.
>>> import pandas as pd
Use the following import convention:
Pandas Data Structures
>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd'])
>>> data = {'Country': ['Belgium', 'India', 'Brazil'],
'Capital': ['Brussels', 'New Delhi', 'Brasília'],
'Population': [11190846, 1303171035, 207847528]}
>>> df = pd.DataFrame(data,
columns=['Country', 'Capital', 'Population'])
Selection
>>> s['b'] Get one element
-5
>>> df[1:] Get subset of a DataFrame
Country Capital Population
1 India New Delhi 1303171035
2 Brazil Brasília 207847528
By Position
>>> df.iloc[[0],[0]] Select single value by row &
'Belgium' column
>>> df.iat([0],[0])
'Belgium'
By Label
>>> df.loc[[0], ['Country']] Select single value by row &
'Belgium' column labels
>>> df.at([0], ['Country'])
'Belgium'
By Label/Position
>>> df.ix[2] Select single row of
Country Brazil subset of rows
Capital Brasília
Population 207847528
>>> df.ix[:,'Capital'] Select a single column of
0 Brussels subset of columns
1 New Delhi
2 Brasília
>>> df.ix[1,'Capital'] Select rows and columns
'New Delhi'
Boolean Indexing
>>> s[~(s > 1)] Series s where value is not >1
>>> s[(s < -1) | (s > 2)] s where value is <-1 or >2
>>> df[df['Population']>1200000000] Use filter to adjust DataFrame
Setting
>>> s['a'] = 6 Set index a of Series s to 6
Applying Functions
>>> f = lambda x: x*2
>>> df.apply(f) Apply function
>>> df.applymap(f) Apply function element-wise
Retrieving Series/DataFrame Information
>>> df.shape (rows,columns)
>>> df.index	 Describe index	
>>> df.columns Describe DataFrame columns
>>> df.info() Info on DataFrame
>>> df.count() Number of non-NA values
Getting
Also see NumPy Arrays
Selecting, Boolean Indexing & Setting Basic Information
Summary
>>> df.sum() Sum of values
>>> df.cumsum() Cummulative sum of values
>>> df.min()/df.max() Minimum/maximum values
>>> df.idxmin()/df.idxmax() Minimum/Maximum index value
>>> df.describe() Summary statistics
>>> df.mean() Mean of values
>>> df.median() Median of values
Dropping
>>> s.drop(['a', 'c']) Drop values from rows (axis=0)
>>> df.drop('Country', axis=1) Drop values from columns(axis=1)
Data Alignment
>>> s.add(s3, fill_value=0)
a 10.0
b -5.0
c 5.0
d 7.0
>>> s.sub(s3, fill_value=2)
>>> s.div(s3, fill_value=4)
>>> s.mul(s3, fill_value=3)
>>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])
>>> s + s3
a 10.0
b NaN
c 5.0
d 7.0
Arithmetic Operations with Fill Methods
Internal Data Alignment
NA values are introduced in the indices that don’t overlap:
You can also do the internal data alignment yourself with
the help of the fill methods:
Sort & Rank
>>> df.sort_index() Sort by labels along an axis
>>> df.sort_values(by='Country') Sort by the values along an axis
>>> df.rank() Assign ranks to entries
Belgium Brussels
India New Delhi
Brazil Brasília
0
1
2
Country Capital
11190846
1303171035
207847528
Population
I/O
Read and Write to CSV
>>> pd.read_csv('file.csv', header=None, nrows=5)
>>> df.to_csv('myDataFrame.csv')
Read and Write to Excel
>>> pd.read_excel('file.xlsx')
>>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')
Read multiple sheets from the same file
>>> xlsx = pd.ExcelFile('file.xls')
>>> df = pd.read_excel(xlsx, 'Sheet1')
>>> help(pd.Series.loc)
Asking For Help
Read and Write to SQL Query or Database Table
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:')
>>> pd.read_sql("SELECT * FROM my_table;", engine)
>>> pd.read_sql_table('my_table', engine)
>>> pd.read_sql_query("SELECT * FROM my_table;", engine)
>>> pd.to_sql('myDf', engine)
read_sql()is a convenience wrapper around read_sql_table() and
read_sql_query()

More Related Content

PDF
Python matplotlib cheat_sheet
PDF
Python seaborn cheat_sheet
PDF
Cheat Sheet for Machine Learning in Python: Scikit-learn
PDF
Pandas Cheat Sheet
PDF
Python bokeh cheat_sheet
PDF
Python Pandas for Data Science cheatsheet
PDF
Scikit-learn Cheatsheet-Python
PDF
Python For Data Science Cheat Sheet
Python matplotlib cheat_sheet
Python seaborn cheat_sheet
Cheat Sheet for Machine Learning in Python: Scikit-learn
Pandas Cheat Sheet
Python bokeh cheat_sheet
Python Pandas for Data Science cheatsheet
Scikit-learn Cheatsheet-Python
Python For Data Science Cheat Sheet

What's hot (20)

PDF
Pandas,scipy,numpy cheatsheet
PDF
Python3 cheatsheet
PDF
NumPy Refresher
PDF
Numpy tutorial(final) 20160303
PDF
Numpy python cheat_sheet
PDF
Scientific Computing with Python - NumPy | WeiYuan
PDF
1 seaborn introduction
PDF
Introduction to NumPy for Machine Learning Programmers
PDF
Cheat sheet python3
PDF
Intoduction to numpy
PDF
Introduction to NumPy (PyData SV 2013)
KEY
Numpy Talk at SIAM
PDF
Python Cheat Sheet
PDF
Python 2.5 reference card (2009)
PPT
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
PDF
1 pythonbasic
PDF
Python_ 3 CheatSheet
PPTX
Introduction to numpy Session 1
PDF
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
PDF
Pandas,scipy,numpy cheatsheet
Python3 cheatsheet
NumPy Refresher
Numpy tutorial(final) 20160303
Numpy python cheat_sheet
Scientific Computing with Python - NumPy | WeiYuan
1 seaborn introduction
Introduction to NumPy for Machine Learning Programmers
Cheat sheet python3
Intoduction to numpy
Introduction to NumPy (PyData SV 2013)
Numpy Talk at SIAM
Python Cheat Sheet
Python 2.5 reference card (2009)
Scientific Computing with Python Webinar March 19: 3D Visualization with Mayavi
1 pythonbasic
Python_ 3 CheatSheet
Introduction to numpy Session 1
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
Ad

Similar to Pandas pythonfordatascience (20)

PDF
PyData Paris 2015 - Track 1.2 Gilles Louppe
PPTX
DataStructures in Pyhton Pandas and numpy.pptx
PPTX
pandasppt with informative topics coverage.pptx
PPTX
dvdxsfdxfdfdfdffddvfbgbesseesesgesesseseggesges
PDF
pandas dataframe notes.pdf
PDF
Data Analysis with Pandas CheatSheet .pdf
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
PPTX
Pandas Dataframe reading data Kirti final.pptx
PPTX
introduction to data structures in pandas
PDF
pandas - Python Data Analysis
PPTX
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
PDF
Pandas cheat sheet_data science
PDF
Pandas cheat sheet
PDF
Data Wrangling with Pandas
PPTX
PDF
pandas-221217084954-937bb582.pdf
PPTX
Pandas.pptx
PPTX
ppanda.pptx
PPTX
Presentation on the basic of numpy and Pandas
PPTX
Data Analysis with Python Pandas
PyData Paris 2015 - Track 1.2 Gilles Louppe
DataStructures in Pyhton Pandas and numpy.pptx
pandasppt with informative topics coverage.pptx
dvdxsfdxfdfdfdffddvfbgbesseesesgesesseseggesges
pandas dataframe notes.pdf
Data Analysis with Pandas CheatSheet .pdf
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas Dataframe reading data Kirti final.pptx
introduction to data structures in pandas
pandas - Python Data Analysis
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Pandas cheat sheet_data science
Pandas cheat sheet
Data Wrangling with Pandas
pandas-221217084954-937bb582.pdf
Pandas.pptx
ppanda.pptx
Presentation on the basic of numpy and Pandas
Data Analysis with Python Pandas
Ad

More from Nishant Upadhyay (11)

PDF
Multivariate calculus
PDF
Multivariate calculus
PDF
Matrices1
PDF
PDF
Mathematics for machine learning calculus formulasheet
PDF
Maths4ml linearalgebra-formula
PDF
Sqlcheetsheet
PDF
Sql cheat-sheet
PDF
My sql installationguide_windows
PDF
Company handout
PDF
Foliumcheatsheet
Multivariate calculus
Multivariate calculus
Matrices1
Mathematics for machine learning calculus formulasheet
Maths4ml linearalgebra-formula
Sqlcheetsheet
Sql cheat-sheet
My sql installationguide_windows
Company handout
Foliumcheatsheet

Recently uploaded (20)

PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
1intro to AI.pptx AI components & composition
PDF
Company Profile 2023 PT. ZEKON INDONESIA.pdf
PDF
Chad Readey - An Independent Thinker
PPTX
办理新西兰毕业证(Lincoln毕业证书)林肯大学毕业证毕业 证
PPTX
artificial intelligence deeplearning-200712115616.pptx
PDF
CB-Insights_Artificial-Intelligence-Report-Q2-2025.pdf
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
DOCX
Estimating GW Storage Variability Using GRACE derived data_Paper.docx
PDF
Company Presentation pada Perusahaan ADB.pdf
PPTX
Purple and Violet Modern Marketing Presentation (1).pptx
PPTX
batch data Retailer Data management Project.pptx
PPTX
Global journeys: estimating international migration
PPTX
Extract Transformation Load (3) (1).pptx
PPTX
Machine Learning Solution for Power Grid Cybersecurity with GraphWavelets
PPTX
Challenges and opportunities in feeding a growing population
PDF
Report The-State-of-AIOps 20232032 3.pdf
PPTX
Logistic Regression ml machine learning.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
1intro to AI.pptx AI components & composition
Company Profile 2023 PT. ZEKON INDONESIA.pdf
Chad Readey - An Independent Thinker
办理新西兰毕业证(Lincoln毕业证书)林肯大学毕业证毕业 证
artificial intelligence deeplearning-200712115616.pptx
CB-Insights_Artificial-Intelligence-Report-Q2-2025.pdf
Taxes Foundatisdcsdcsdon Certificate.pdf
Business Acumen Training GuidePresentation.pptx
Estimating GW Storage Variability Using GRACE derived data_Paper.docx
Company Presentation pada Perusahaan ADB.pdf
Purple and Violet Modern Marketing Presentation (1).pptx
batch data Retailer Data management Project.pptx
Global journeys: estimating international migration
Extract Transformation Load (3) (1).pptx
Machine Learning Solution for Power Grid Cybersecurity with GraphWavelets
Challenges and opportunities in feeding a growing population
Report The-State-of-AIOps 20232032 3.pdf
Logistic Regression ml machine learning.pptx

Pandas pythonfordatascience

  • 1. PythonForDataScience Cheat Sheet Pandas Basics Learn Python for Data Science Interactively at www.DataCamp.com Pandas DataCamp Learn Python for Data Science Interactively Series DataFrame 4 7 -5 3 d c b aA one-dimensional labeled array capable of holding any data type Index Index Columns A two-dimensional labeled data structure with columns of potentially different types The Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language. >>> import pandas as pd Use the following import convention: Pandas Data Structures >>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) >>> data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasília'], 'Population': [11190846, 1303171035, 207847528]} >>> df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population']) Selection >>> s['b'] Get one element -5 >>> df[1:] Get subset of a DataFrame Country Capital Population 1 India New Delhi 1303171035 2 Brazil Brasília 207847528 By Position >>> df.iloc[[0],[0]] Select single value by row & 'Belgium' column >>> df.iat([0],[0]) 'Belgium' By Label >>> df.loc[[0], ['Country']] Select single value by row & 'Belgium' column labels >>> df.at([0], ['Country']) 'Belgium' By Label/Position >>> df.ix[2] Select single row of Country Brazil subset of rows Capital Brasília Population 207847528 >>> df.ix[:,'Capital'] Select a single column of 0 Brussels subset of columns 1 New Delhi 2 Brasília >>> df.ix[1,'Capital'] Select rows and columns 'New Delhi' Boolean Indexing >>> s[~(s > 1)] Series s where value is not >1 >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> df[df['Population']>1200000000] Use filter to adjust DataFrame Setting >>> s['a'] = 6 Set index a of Series s to 6 Applying Functions >>> f = lambda x: x*2 >>> df.apply(f) Apply function >>> df.applymap(f) Apply function element-wise Retrieving Series/DataFrame Information >>> df.shape (rows,columns) >>> df.index Describe index >>> df.columns Describe DataFrame columns >>> df.info() Info on DataFrame >>> df.count() Number of non-NA values Getting Also see NumPy Arrays Selecting, Boolean Indexing & Setting Basic Information Summary >>> df.sum() Sum of values >>> df.cumsum() Cummulative sum of values >>> df.min()/df.max() Minimum/maximum values >>> df.idxmin()/df.idxmax() Minimum/Maximum index value >>> df.describe() Summary statistics >>> df.mean() Mean of values >>> df.median() Median of values Dropping >>> s.drop(['a', 'c']) Drop values from rows (axis=0) >>> df.drop('Country', axis=1) Drop values from columns(axis=1) Data Alignment >>> s.add(s3, fill_value=0) a 10.0 b -5.0 c 5.0 d 7.0 >>> s.sub(s3, fill_value=2) >>> s.div(s3, fill_value=4) >>> s.mul(s3, fill_value=3) >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd']) >>> s + s3 a 10.0 b NaN c 5.0 d 7.0 Arithmetic Operations with Fill Methods Internal Data Alignment NA values are introduced in the indices that don’t overlap: You can also do the internal data alignment yourself with the help of the fill methods: Sort & Rank >>> df.sort_index() Sort by labels along an axis >>> df.sort_values(by='Country') Sort by the values along an axis >>> df.rank() Assign ranks to entries Belgium Brussels India New Delhi Brazil Brasília 0 1 2 Country Capital 11190846 1303171035 207847528 Population I/O Read and Write to CSV >>> pd.read_csv('file.csv', header=None, nrows=5) >>> df.to_csv('myDataFrame.csv') Read and Write to Excel >>> pd.read_excel('file.xlsx') >>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') Read multiple sheets from the same file >>> xlsx = pd.ExcelFile('file.xls') >>> df = pd.read_excel(xlsx, 'Sheet1') >>> help(pd.Series.loc) Asking For Help Read and Write to SQL Query or Database Table >>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///:memory:') >>> pd.read_sql("SELECT * FROM my_table;", engine) >>> pd.read_sql_table('my_table', engine) >>> pd.read_sql_query("SELECT * FROM my_table;", engine) >>> pd.to_sql('myDf', engine) read_sql()is a convenience wrapper around read_sql_table() and read_sql_query()