Data Analysis With Pandas
Data Analysis With Pandas
Pandas
Pan das
• Pyt h o n is a g r e at lan g u ag e f o r d o in g d at a
an aly sis, p r i m ar ily b e cau se o f t h e f an t ast ic
e co syst e m o f da t a-ce n t r ic Py t h o n p a ck ag e s.
• Pan d as is o n e o f t h o se pa ck ag e s, an d m ake s
i m p o r t in g an d an alyz in g da t a m u ch e asie r.
• Pan d as b u ilds o n pa ck ag e s lik e N u m Py an d
m at p lo t lib t o g ive yo u a sin g le, co n ve n ie n t ,
place t o d o m o st o f y o u r da t a an aly sis an d
visu alizat io n w o r k .
I m p o r t i n g d ata w i t h Pand as
im p o r t p a n d as as p d
m = p d .read_csv("m o vies.csv" )
Exam p le :
H e ad and Tail
>>> m .shap e
(10 , 5 )
>>> x = m .shap e
>>> t yp e(x)
<t ype 't uple'>
>>> x[ 0 ]
10
>>> x[ 1 ]
5
Indexi n g D a taFra m e s w it h Pandas
m .ilo c[0:5,:]
Som e in d e xin g ex a m p l es
• N o w t h a t w e k n o w h o w t o r e t r ie ve r o w s and co lu m n s b y
pos it io n , it ’s w o rt h lo o k in g in t o t h e o t h e r m ajor w ay t o w o rk w i t h
Da t aFr a m e s, w h ich is t o r e t r ieve r o w s and co l u m n s b y label.
• A m aj o r ad van t age o f Pand as o ve r N u m Py is t h a t e ach o f t h e
co lu m n s an d r o w s has a lab e l. W o r k in g w i t h co l u m n po si t ion s is
po ssib le , b u t i t can b e h a rd t o ke e p t r ack o f w h ich n u m b e r
co rr e sp o n ds t o w h ich co l u m n .
• W e can w o rk w i t h lab e ls u sin g t h e p a n das.D at aFr am e .loc m e t h od,
w h ich allo w s u s t o ind e x u sin g labe ls in st e ad o f po si t io n s.
• W e can disp lay t h e fi r st fiv e r o w s o f r e view s u sin g t h e loc m e t h o d
lik e t h is:
• Th e re ’s a t h ir d , e ve n e asie r, w ay t o r e t r ie ve a w h o le co lu m n .
W e can ju st sp e cif y t h e co lu m n n a m e in sq u a re b r acke t s, like
w it h a d ict io n ar y:
m [ "ye a r "]
D a ta t yp e s
• W h e n w e r e t r ie ve a sin g le co l u m n , w e ’r e
ac tu ally r e t r ie v i n g a Pan d as Se r ie s o b je c t. A
D at a Fr am e st o r e s t a b u la r d ata , b u t a Se r ie s
st o r e s a sing le co l u m n o r r o w o f d at a.
• W e can ve r i f y th at a sing le co lu m n is a
Se r ie s:
t yp e (m ["rat ing"])
p and as.co re .ser ie s.Ser ie s
Series o b j e ct
s1 =
p d .Se ries([1 ,2] ) s1
Series o b j e ct
pd .D a taFram e ( [s1,s2 ])
Cr e a tin g D ata Fr a m e s
p d.D a t aFram e (
[
[ 1 ,2 ],
[" Sachin
Te ndulk ar" ,
"Rahul
D ravid "]
]
Cr e a tin g D ata Fr a m e s
p d .Da taFr am e(
[
[ 1 ,2] ,
[ "Sach in
Te n d u lk ar
", "Rahu l
D r av id " ]
],
co l u m ns =
Cr e a tin g D ata Fr a m e s
p d .D at aFram e(
[
[ 1 ,2 ] ,
[ " Sach in
Te n d u l k ar
", " Rah u l
D ravid " ]
],
co l u m n s = [ " firs t " ," seco n d " ] ,
D a taFra m e m e th o ds
t y p e (m ["nam e "])
pandas.co re .se r ies.Se r ies
m ["ra ting"].mean( )
3.52 00 000 000 0 000 05
m .m ean ( )
D a taFra m e m e th o ds
m . m ean(a x is=1)
D a taFra m e m e th o ds
• Th e r e ar e q u it e a f e w m e t h o d s o n Ser ie s an d D at aFr am e s t h a t
b e h ave like m e an. H e r e ar e so m e h an d y o n e s:
– p an d as.Dat aFr am e .co r r – f i n d s t h e co r r e lat io n b e t w e e n co lu m n s
in a D at aFr am e .
– p an d as.Dat aFr am e .co u n t – co u n t s t h e n u m b e r o f n o n -nu ll
valu e s in e ach D at aFr am e co lu m n .
– p an d as.Dat aFr am e .m a x – f i n d s t h e hig h e st valu e in e ach
co lu m n .
– p an d as.Dat aFr am e .m in – f i n d s t h e lo w e st valu e in e ach
co l u m n .
– p an d as.Dat aFr am e .m e d ian – f i n d s t h e m e d ian o f e ach
co lu m n .
– p an d as.Dat aFr am e .st d – f i n d s t h e st an d a rd d e viat io n o f e ach
co l u m n .
D a taFra m e w it h Pandas
• W e can also p er f o r m m at h o p e r at i o n s o n Se r ie s o r
D at aFr am e o b je cts.
• Fo r ex am p le, w e can d iv id e e ve r y valu e in th e d u r at i o n
co lu m n b y 2 t o sw i t ch t h e scale f r o m 0 -10 t o 0 -5 :
m ["ra t ing"]/ 2
• A ll t h e co m m o n m at h em atic al o p er at o r s t h at w o r k in
Pyt h o n , like +, -, *, /, an d * * w ill w o r k , an d w ill ap p ly
t o e ach ele m e n t in a D ata Fr am e o r a Ser ie s.
B oo lean i n d e xin g
f m = m [r _ f il t e r]
fm
M u ltip le f i lt e r i n g
m [f i l t er 1 ]