0% found this document useful (0 votes)
77 views1 page

Pandas Cheat Sheet Final

This document provides a cheat sheet for common Pandas operations: 1. It outlines how to install and import Pandas using pip and importing as pd. 2. It describes Pandas conventions for reading and writing dataframes and exploring data through feature exploration like masking and filtering. 3. It lists several common operations like sorting, grouping, pivoting, and melting data between long and wide formats.

Uploaded by

ASWINKUMAR R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views1 page

Pandas Cheat Sheet Final

This document provides a cheat sheet for common Pandas operations: 1. It outlines how to install and import Pandas using pip and importing as pd. 2. It describes Pandas conventions for reading and writing dataframes and exploring data through feature exploration like masking and filtering. 3. It lists several common operations like sorting, grouping, pivoting, and melting data between long and wide formats.

Uploaded by

ASWINKUMAR R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

PANDAS cheat-sheet

1 Installing and Importing Column df.loc[:, ‘a’:’ ’] b


7 Operations df.groupby ‘group_col_name’).
( Pivot
df.pi ot ind x=[‘li t of column ,
v ( e s s]

Both row and df.loc[1:3, ‘a’:’ ’] (1 and b 3


filter( oolean array a ed on b b s Opposite of melt, columns=’col_name’,
columns are explicit indice ere) condition)

converts dataframe alu s=’col_name’)


Installing pip install pandas SOR T I N G


s h v e

from long to wide


E.g.
format

Importing
2 Convention import pandas as pd F eature exploration ( masking, filtering ) df.sort al u s [‘col1’],
_v e (

Group based data ro b ('dire tor name').


E.g.

a cending=[ T rue )
.g up y c _
s ] filtering filter(lambda : ["b d et"].
x x u g Outputs a multi- data melt ivot(inde =['Date' 'Dr
_ .p x , u

2 Reading and Writing data Masking


df[‘col’'] > alue
ma () >= 100)

x index dataframe
Name' 'Parameter']
g_ ol mns =, , c u

Creates a mask based on E df[‘a e’] > 30


v
B uilt in ops 'time' val es='readin ') , u g

df = pd.r ad cs our required condition .g. g


# This filters all rows of those
Built in ops such as mean, min, max, etc.
dire tors whose ma im m b d et is
e _ v

(pat = ‘filename.c ’
h sv )

df.loc[ df['col1’] == al1)


( v

reater than 100 million)


x u u g

& data[‘col ’] == al )]

E.g., df[‘col1’].min(), df[‘col1’].count(), etc. g

Reading data # an e tend for son


( 2 v 2

df.groupby ‘group_col_name’).

e el t es too sin
x j ,

E.g.

A pply (
xc yp u g
Filtering
apply(function)

d read son d read e el et


p . _j /p . _ xc , c. Filters data based on df lo [(df[‘month’] ==
. c
Applies a function along one of the axis of the dataframe
Cut
df[‘new_cat_column’]=pd.cut

df.to cs ‘filename.c ’ conditions ‘Jan ar ’) & u y

df[‘col’].apply(function)

E.g.
Bins continuous (df[‘continou _col’],bins= in

(df[‘ ear’]==’2022’)]
def f n ( ):

_ v( sv )

s b
y u c x
data into _ alue , lab ls=la el_ alue )

# filters o t data for ["risk "] = ["b d et"] -


v s e b v s

Writing data # an e tend for son an ar 2022


u

E.g.

x y x u g

categorical groups
["reven e"] mean() >= 0

c x j ,

e el and s h too sin j u y


Group based apply ret x u . E.g.

xc

df to son df to e el et
uc u g

data[['re enue', ' udget' ].apply np.sum, axis=1)

v b ] ( rn x
u data tid ['tem at'] =
data risk = data ro by

_ y p_c
#sums values of revenue and budget across each row
. _j / . _ xc , c.

6 Dataframe Manipulation _ y .g up d t(data tid ['Tem erat re']


("dire tor name") a l (f n )

p .cu _ y p u ,

bins=tem oints
3 Series and Dataframes
c _ . pp y u c p_p ,

8 Joins labels=tem labels)


# Finds movies whose b d et is
p_

A dding a new row/column hi her than its dire tor’s


u g

C reating a series Concat


g c

avera e reven e
pd. S ri s [‘a’, ‘ ’, ‘c’
g u 1 67 0-11 <12
1 67 Old
e e ( b ] ) df.loc[explicit_row_num] = ['a’, 1]
pd.concat([df 1, df2], axis = 0] (for concatenating horizontally, 2 40 12-17 Teen
2 40 Adult
18-59 Adult

e.g.
change axis = 1) 3 34 60 & above Older 34 Adult
C reating a dataframe Row df.loc[len(df.index)] = ['a', 1]
1
3

# this will add a row at the end of the dataframe 1 3


pd. ata rame([[‘a’, 1 , axis = 1 Shift
df[‘col’].s ift p riods=n, axi =0
df[‘new_col’]=data
h ( e s )

Row
D F ]

name id Column
5
Shifts the values of
Oriented [‘ ’, , b

column =[‘name’, ‘id’ )


2]]

0 a 1 df1 df1 df
3
6
Group

by
8 average

2
rows/columns
E.g.

2 2 agg. 5
df["Marks"] shift( eriods = 1
D eleting a new row/column CONCAT
s ]

Column pd. ata rame({‘name’:


D F
1 b 2
8 55 .
a is = 0)
. p ,

Oriented [‘a’, ‘ ’ , ‘id’:[1, }) 2 5 x

df
b ] 2]

df.drop la el =None, axi =0 df1 6

axis = 0
( b s s )

4 Info extraction E.g.


df
Row df dro (3 a is=0) 

. p , x
2 10 Cleaning our data
Shape
df.s ape
# Here 3 is the e li it inde
None and nan
h xp c x,

(Return a tuple representing the #e -(2 3)


.g. ,
a is=0 is for row
x
Merge
dimensionality of the DataFrame.) for 2 rows

Column df.drop ‘col_name’, axi =1) “NaN” is for columns with numbers as their value
12 Misc T opics
and 3 ol mns c u
( s
df 1.merge(df2, on=’foreign_key’, how=’type_of_join’
“None” is for columns with non-number entries(e.g. String,
Optional -> left_on and right_o
Head

(first n rows, default 5) df. ad n)he ( R enaming a column Eg. df 1.merge(df2, on=’id’, how=’inner’)
object type, etc. D atetime
Can check for null values using “isna()
Tail Convert to Datetime object: pd.to_datetime(df[‘col’])

(last n rows, default 5) df.tail n) ( Row df.ind x = new_indice


e s
E.g
df.isna() # returns the dataframe with True/False for null
df.r nam {‘old_name’:’new_name’, ‘ foreign key ’ Extracting Information
Info

df.info ) Column axi =1})


e e(
df1 values in the respective element’s positio
(return info of all columns) ( s
Merge df1 df df.isna().sum() # returns number of null values per Extract t e year for t e 0t s h h h

Describe D uplicates and dropping duplicates


2
column. Can modify with df.isna().sum(axis=1) for each index alue
v

df

(gives statistical information of df.d scrib e e( ) 2


row’s null coun df[‘col’][0].year Here 0 i t e implicit index
s h

data) U e .mont and .day for t e


s h h
Find duplicate rows df.isna().sum().sum() # returns total number of null
re pecti e data
s v
values
5 Accessing Extract t e year for w ole
df.duplicated(subset=None, keep=’first’)

9 Groupby F illing null values df[‘col’].dt.year s

column (all t e datetime alue )


s
h

h
h

v s
# subset can be used to specify certain column(s) for

D irect accessing columns and rows, as well identifying the duplicates

d f . f i l l n a n ) fill null alue wit alue ‘n’


( # s v s h v
df[‘col’] ormat t e elect data (0t index
F s h s h

as both together df.groupby ‘group_col_name’)


[0].strftime(‘%M% datetime alue ere) into t e
D ropping null v alues
( v h h
# keep determines which duplicates to mar [‘col( )’].aggregate_function
s ()

Y’) required data time format (mont h

df.loc[ i] df.dropna(axis = 0)

and year in t i ca e) h s s
e

(ei ere i explicit index


first : Marks All duplicates as True except for the first Grouping based on E.g.

Accessing a row # h s )

a single aggregate df ro b (‘dire tor name’)


# Default axis=0, use 1 for columns

df.iloc[ii]

occurrence .g

[‘title’] o nt()

up y c _

# Drops rows/columns with even a single missing value S tring functions


#(ii ere i implicit index)
h s last : Marks all duplicates as True except for the last
# Finds n mber of titles
.c u

df[‘column_name’] occurrence er dire tor


Accessing a column #for sin le ol m

False : Marks all duplicates as True

p c
11 Data Restructuring We can use .str to apply string functions to any column

df.groupby [‘group_col_name’])

g c u n

df[‘col’].str.function()

df[[‘col1’, ‘col ’]] Returns a boolean series for each duplicate row marked as True
(

[‘col’].aggregate [‘func1’, pd.m lt df, id ars=[‘li t of


2

#for m lti le ol mns ( Melt 


e ( _v s

‘func ’] Convert dataframe column ’ )

u p c u

Drop duplicate values


Grouping based 2 )

s ] E.g.

S L ICI N G on multiple from wide to long E.g.


i. data_tidy['Date'].str.split('-')

aggregates E.g.
format
d melt(data id vars=['Date'
p . , _ ,
df.loc[1:3]
df.drop_duplicates(subset=None, keep=’first’)

df ro b (['dire tor name'])

.g up y c _ 'Parameter' 'Dr Name']) , ug_


# This will split the “Date” column into elements separated by “-”

(1 and 3 are t e explicit indice ere


h s h )
[" ear"] a re ate(['min' 'ma '])

y . gg g , x
ii. data_tidy.loc[data_tidy['Drug_Name'].str.contains

Row Or
# Parameters have the same meaning as in df.duplicated, # Finds first and re ent ear c y
('hydrochloride')]

df.iloc[ :4] ( and 4 are t e implicit


2 2 h

except here it will drop the rows marked duplicate of movies made b all dire tors y c # Will filter out rows containing the string “hydrochloride”
indice ere)s h

You might also like