0% found this document useful (0 votes)
596 views1 page

Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information

This document provides a summary of key Pandas functions for working with DataFrames and Series. It covers reading and writing data to common file types like CSV and Excel. It also discusses selecting and filtering DataFrames, applying functions, descriptive statistics, and alignment of indexes during arithmetic operations. The Pandas library is built on NumPy and provides easy-to-use data structures and analysis tools for Python.

Uploaded by

locuto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
596 views1 page

Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information

This document provides a summary of key Pandas functions for working with DataFrames and Series. It covers reading and writing data to common file types like CSV and Excel. It also discusses selecting and filtering DataFrames, applying functions, descriptive statistics, and alignment of indexes during arithmetic operations. The Pandas library is built on NumPy and provides easy-to-use data structures and analysis tools for Python.

Uploaded by

locuto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

> I/O > Retrieving Series/DataFrame Information

Python For Data Science Read and Write to CSV Basic Information

Pandas Basics Cheat Sheet >>> pd.read_csv(‘file.csv’, header=None, nrows=5)

>>> df.to_csv('myDataFrame.csv')
>>>
>>>
>>>
df.shape #(rows,columns)

df.index #Describe index

df.columns #Describe DataFrame columns

>>> df.info() #Info on DataFrame

Learn Pandas Basics online at www.DataCamp.com Read and Write to Excel >>> df.count() #Number of non-NA values

>>> pd.read_excel(‘file.xlsx’)

>>> df.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')


Summary
Read multiple sheets from the same file df.sum() #Sum of values

Pandas
>>>
>>> df.cumsum() #Cummulative sum of values

>>> xlsx = pd.ExcelFile(‘file.xls’)

>>> df.min()/df.max() #Minimum/maximum values

>>> df = pd.read_excel(xlsx, 'Sheet1')


>>> df.idxmin()/df.idxmax() #Minimum/Maximum index value

>>> df.describe() #Summary statistics

The Pandas library is built on NumPy and provides easy-to-use data


structures and data analysis tools for the Python programming language. Read and Write to SQL Query or Database Table >>>
>>>
df.mean() #Mean of values

df.median() #Median of values

Use the following import convention: >>> from sqlalchemy import create_engine

>>> engine = create_engine('sqlite:///:memory:')

>>> import pandas as pd >>>


>>>
pd.read_sql("SELECT * FROM my_table;", engine)

pd.read_sql_table('my_table', engine)
> Applying Functions
>>> pd.read_sql_query("SELECT * FROM my_table;", engine)
read_sql() is a convenience wrapper around read_sql_table() and read_sql_query() >>> f = lambda x: x*2

> Pandas Data Structures >>> df.to_sql('myDf', engine) >>> df.apply(f) #Apply function

>>> df.applymap(f) #Apply function element-wise

Series
> Selection Also see NumPy Arrays
> Data Alignment
A one-dimensional labeled array
a 3
capable of holding any data type b -5 Getting Internal Data Alignment
Index
c 7 >>> s['b'] #Get one element

NA values are introduced in the indices that don’t overlap:


d 4 -5

>>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) >>> df[1:] #Get subset of a DataFrame
>>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd'])

Country Capital Population


>>> s + s3

1 India New Delhi 1303171035


a 10.0

Dataframe 2 Brazil Brasília 207847528 b NaN

c 5.0

Selecting, Boolean Indexing & Setting


d 7.0
A two-dimensional labeled data structure

with columns of potentially different types


By Position Arithmetic Operations with Fill Methods
Columns Country Capital Population
>>> df.iloc[[0],[0]] #Select single value by row & column

0 Belgium Brussels 11190846 'Belgium'

You can also do the internal data alignment yourself with the help of the fill methods:
Index 1 India New Delhi 1303171035 >>> df.iat([0],[0])
>>> s.add(s3, fill_values=0)

'Belgium' a 10.0

2 Brazil Brasilia 207847528


b -5.0

By Label
>>> data = {'Country': ['Belgium', 'India', 'Brazil'],
c 5.0

'Capital': ['Brussels', 'New Delhi', 'Brasília'],


>>> df.loc[[0], ['Country']] #Select single value by row & column labels
d 7.0

'Population': [11190846, 1303171035, 207847528]}


'Belgium'
>>> s.sub(s3, fill_value=2)

>>> df = pd.DataFrame(data,
>>> df.at([0], ['Country'])
>>> s.div(s3, fill_value=4)

columns=['Country', 'Capital', 'Population']) 'Belgium' >>> s.mul(s3, fill_value=3)

By Label/Position

> Dropping
>>> df.ix[2] #Select single row of subset of rows

Country Brazil

Capital Brasília

Population 207847528

>>> s.drop(['a', 'c']) #Drop values from rows (axis=0)


>>> df.ix[:,'Capital'] #Select a single column of subset of columns

>>> df.drop('Country', axis=1) #Drop values from columns(axis=1) 0 Brussels

1 New Delhi

2 Brasília

>>> df.ix[1,'Capital'] #Select rows and columns

> Asking For Help 'New Delhi'

Boolean Indexing
>>> help(pd.Series.loc) >>> s[~(s > 1)] #Series s where value is not >1

>>> s[(s < -1) | (s > 2)] #s where value is <-1 or >2

>>> df[df['Population']>1200000000] #Use filter to adjust DataFrame

> Sort & Rank Setting

>>> s['a'] = 6 #Set index a of Series s to 6

>>> df.sort_index() #Sort by labels along an axis


Learn Data Skills Online at
>>> df.sort_values(by='Country') #Sort by the values along an axis

>>> df.rank() #Assign ranks to entries


www.DataCamp.com

You might also like