Pandas Cheat Sheet Final
Pandas Cheat Sheet Final
Importing
2 Convention import pandas as pd F eature exploration ( masking, filtering ) df.sort al u s [‘col1’],
_v e (
a cending=[ T rue )
.g up y c _
s ] filtering filter(lambda : ["b d et"].
x x u g Outputs a multi- data melt ivot(inde =['Date' 'Dr
_ .p x , u
x index dataframe
Name' 'Parameter']
g_ ol mns =, , c u
(pat = ‘filename.c ’
h sv )
& data[‘col ’] == al )]
df.groupby ‘group_col_name’).
e el t es too sin
x j ,
E.g.
A pply (
xc yp u g
Filtering
apply(function)
df[‘col’].apply(function)
E.g.
Bins continuous (df[‘continou _col’],bins= in
(df[‘ ear’]==’2022’)]
def f n ( ):
_ v( sv )
s b
y u c x
data into _ alue , lab ls=la el_ alue )
E.g.
x y x u g
categorical groups
["reven e"] mean() >= 0
c x j ,
xc
df to son df to e el et
uc u g
v b ] ( rn x
u data tid ['tem at'] =
data risk = data ro by
_ y p_c
#sums values of revenue and budget across each row
. _j / . _ xc , c.
p .cu _ y p u ,
bins=tem oints
3 Series and Dataframes
c _ . pp y u c p_p ,
avera e reven e
pd. S ri s [‘a’, ‘ ’, ‘c’
g u 1 67 0-11 <12
1 67 Old
e e ( b ] ) df.loc[explicit_row_num] = ['a’, 1]
pd.concat([df 1, df2], axis = 0] (for concatenating horizontally, 2 40 12-17 Teen
2 40 Adult
18-59 Adult
e.g.
change axis = 1) 3 34 60 & above Older 34 Adult
C reating a dataframe Row df.loc[len(df.index)] = ['a', 1]
1
3
Row
D F ]
name id Column
5
Shifts the values of
Oriented [‘ ’, , b
0 a 1 df1 df1 df
3
6
Group
by
8 average
2
rows/columns
E.g.
2 2 agg. 5
df["Marks"] shift( eriods = 1
D eleting a new row/column CONCAT
s ]
D F
1 b 2
8 55 .
a is = 0)
. p ,
df
b ] 2]
axis = 0
( b s s )
. p , x
2 10 Cleaning our data
Shape
df.s ape
# Here 3 is the e li it inde
None and nan
h xp c x,
Column df.drop ‘col_name’, axi =1) “NaN” is for columns with numbers as their value
12 Misc T opics
and 3 ol mns c u
( s
df 1.merge(df2, on=’foreign_key’, how=’type_of_join’
“None” is for columns with non-number entries(e.g. String,
Optional -> left_on and right_o
Head
(first n rows, default 5) df. ad n)he ( R enaming a column Eg. df 1.merge(df2, on=’id’, how=’inner’)
object type, etc. D atetime
Can check for null values using “isna()
Tail Convert to Datetime object: pd.to_datetime(df[‘col’])
df
h
h
v s
# subset can be used to specify certain column(s) for
df.loc[ i] df.dropna(axis = 0)
and year in t i ca e) h s s
e
first : Marks All duplicates as True except for the first Grouping based on E.g.
Accessing a row # h s )
df.iloc[ii]
occurrence .g
[‘title’] o nt()
up y c _
p c
11 Data Restructuring We can use .str to apply string functions to any column
df.groupby [‘group_col_name’])
g c u n
df[‘col’].str.function()
df[[‘col1’, ‘col ’]] Returns a boolean series for each duplicate row marked as True
(
u p c u
s ] E.g.
aggregates E.g.
format
d melt(data id vars=['Date'
p . , _ ,
df.loc[1:3]
df.drop_duplicates(subset=None, keep=’first’)
y . gg g , x
ii. data_tidy.loc[data_tidy['Drug_Name'].str.contains
Row Or
# Parameters have the same meaning as in df.duplicated, # Finds first and re ent ear c y
('hydrochloride')]
except here it will drop the rows marked duplicate of movies made b all dire tors y c # Will filter out rows containing the string “hydrochloride”
indice ere)s h