0% found this document useful (0 votes)
42 views28 pages

Data Analysis With Pandas

Pandas is a Python package that makes importing and analyzing data easier. It builds on packages like NumPy and Matplotlib to provide a single convenient place to do most data analysis and visualization work. The document discusses how to import data with Pandas, such as reading in a CSV file. It also covers indexing and selecting subsets of data from DataFrames using labels, locations, and other Pandas methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views28 pages

Data Analysis With Pandas

Pandas is a Python package that makes importing and analyzing data easier. It builds on packages like NumPy and Matplotlib to provide a single convenient place to do most data analysis and visualization work. The document discusses how to import data with Pandas, such as reading in a CSV file. It also covers indexing and selecting subsets of data from DataFrames using labels, locations, and other Pandas methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Data Analysis with

Pandas
Pan das

• Pyt h o n is a g r e at lan g u ag e f o r d o in g d at a
an aly sis, p r i m ar ily b e cau se o f t h e f an t ast ic
e co syst e m o f da t a-ce n t r ic Py t h o n p a ck ag e s.
• Pan d as is o n e o f t h o se pa ck ag e s, an d m ake s
i m p o r t in g an d an alyz in g da t a m u ch e asie r.
• Pan d as b u ilds o n pa ck ag e s lik e N u m Py an d
m at p lo t lib t o g ive yo u a sin g le, co n ve n ie n t ,
place t o d o m o st o f y o u r da t a an aly sis an d
visu alizat io n w o r k .
I m p o r t i n g d ata w i t h Pand as

• The f i r st s te p w e ’ll t ake is t o r ead th e


d ata in.
• The d a ta is s to r e d as a co m m a-sepa ra te d
v alu e s, o r csv, f i le, w h e r e each r o w is
sepa ra te d b y a n e w lin e, an d each
co lu m n b y a co m m a (,).
Sample: m o vies.csv

n o ,nam e ,ye ar,rat in g ,du r a tio n


1 ,Dhad ake ba z,19 8 6 ,3.2 ,75 6 0
2 ,Dh u m d h a d aka,19 8 5 ,3.8,63 0 0
3 ,A sh i hi b anva b an vi,19 8 8 ,4 .1,78 0 2
4 ,Zap at le la,19 9 3 ,3.7 ,60 2 2
5 ,A ya tya Ghar at Ghar o b a,19 9 1 ,3.4,5 4 2 0
6 ,N avr a M aza N avsacha,20 0 4 ,3.9,49 0 4
7 ,De d anad an,19 8 7 ,3.4 ,56 2 3
8 ,Gam m a t Jam m at ,1 9 8 7 ,3.4,75 6 3
9 ,Eka p e ksha e k ,19 9 0 ,3.2 ,62 4 4
1 0 ,Pach had le la,20 0 4 ,3.1 ,69 5 6
Read d ata

im p o r t p a n d as as p d

m = p d .read_csv("m o vies.csv" )
Exam p le :
H e ad and Tail

• O n ce w e re ad in a D at aFra m e , Pan d as g ive s u s


t w o m e t h o d s t h at m ake i t fa st t o p r i n t o u t t h e
d at a. Th e se f u n ct io n s ar e:
– p and a s.Dat aFr am e.h e ad – p r in t s t h e f i r s t N ro w s o f
a Dat aFr am e. By d efa u lt 5 .
– p and a s.Dat aFr am e.ta il – p r in ts t h e last N ro w s o f a
Dat aFra m e. By d e fa u lt 5 .
• W e ’ll u se t h e h e ad m e t h o d t o se e w hat ’s in
m o vies:
m .he ad ( )
Fi n d n u m b er o f ro w s and co lu m n s

>>> m .shap e
(10 , 5 )
>>> x = m .shap e
>>> t yp e(x)
<t ype 't uple'>
>>> x[ 0 ]
10
>>> x[ 1 ]
5
Indexi n g D a taFra m e s w it h Pandas

• Ear lie r, w e u se d t h e h e ad m e t h o d t o p r in t t h e f i rst 5


r o w s o f re v ie w s. W e co u l d acco m p lish t h e sam e th in g
u sin g t h e p an d as.D ata Fra m e .ilo c m e t h o d .
• Th e ilo c m e th o d allo w s u s t o r e t r ie ve ro w s an d
co lu m n s b y p o sit i o n . In o r d e r t o d o t h at , w e ’ll n e e d t o
sp e cif y t h e p o sit i o n s o f th e r o w s t h at w e w an t , an d
t h e p o sit i o n s o f t h e co lu m n s t h at w e w an t as w e ll.
• Th e b e lo w co d e w ill r e p licat e m .h e ad ( ):

m .ilo c[0:5,:]
Som e in d e xin g ex a m p l es

• m .ilo c[:5,:] – t h e f i r s t 5 ro w s, and all o f


t h e co lu m n s fo r t h o se r o w s.
• m .ilo c[:,:] – t h e e n t i r e D at aFr am e.
• m .ilo c[5 :,5:] – ro w s f r o m p o sit i o n 5 o n w ards,
an d co lu m n s f r o m p o sit i o n 5 o nw ards.
• m .ilo c[:,0] – t h e f i r s t co lu m n , and all o f
t h e ro w s fo r t h e co lu m n .
• m .ilo c[9 ,:] – t h e 1 0 t h ro w , and all o f
t h e co lu m n s fo r t h at ro w .
Som e in d e xin g ex a m p l es

• N o w t h a t w e k n o w h o w t o r e t r ie ve r o w s and co lu m n s b y
pos it io n , it ’s w o rt h lo o k in g in t o t h e o t h e r m ajor w ay t o w o rk w i t h
Da t aFr a m e s, w h ich is t o r e t r ieve r o w s and co l u m n s b y label.
• A m aj o r ad van t age o f Pand as o ve r N u m Py is t h a t e ach o f t h e
co lu m n s an d r o w s has a lab e l. W o r k in g w i t h co l u m n po si t ion s is
po ssib le , b u t i t can b e h a rd t o ke e p t r ack o f w h ich n u m b e r
co rr e sp o n ds t o w h ich co l u m n .
• W e can w o rk w i t h lab e ls u sin g t h e p a n das.D at aFr am e .loc m e t h od,
w h ich allo w s u s t o ind e x u sin g labe ls in st e ad o f po si t io n s.
• W e can disp lay t h e fi r st fiv e r o w s o f r e view s u sin g t h e loc m e t h o d
lik e t h is:

re v iew s .loc[0 :5 ,:]


Som e in d e xin g ex a m p l es

• Co lu m n labe ls can m ake lif e m uch e asie r


w h e n y o u ’r e w o r k in g w it h d a ta. W e can
spe c i fy co lu m n labels in t h e lo c m e th o d
t o r e t r ie ve co l u m n s b y lab el i n s tead o f b y
p o sitio n.

m.lo c[:5,"ye a r"]


M u ltip le in d exin g

m.loc[:5,[" r a tin g " ,"y ear " ] ]


Pan das series o b j e cts

• W e can r e tr ie ve an in d ivid u al co lu m n in Pand as a f e w d if f e r e n t


w ays. So f ar, w e ’ve se e n t w o t yp es o f syn t ax f o r t h is:

m .ilo c[:,1 ] – w ill r e t r ie ve t h e seco n d co lu m n .


m .lo c[:,"ye a r"] – w ill also r e t r ie ve t h e se co n d co lu m n .

• Th e re ’s a t h ir d , e ve n e asie r, w ay t o r e t r ie ve a w h o le co lu m n .
W e can ju st sp e cif y t h e co lu m n n a m e in sq u a re b r acke t s, like
w it h a d ict io n ar y:

m [ "ye a r "]
D a ta t yp e s

• W h e n w e r e t r ie ve a sin g le co l u m n , w e ’r e
ac tu ally r e t r ie v i n g a Pan d as Se r ie s o b je c t. A
D at a Fr am e st o r e s t a b u la r d ata , b u t a Se r ie s
st o r e s a sing le co l u m n o r r o w o f d at a.

• W e can ve r i f y th at a sing le co lu m n is a
Se r ie s:

t yp e (m ["rat ing"])
p and as.co re .ser ie s.Ser ie s
Series o b j e ct

• W e can cre a te a Se ries m anu a lly t o


b et t er u n d e r s ta n d h o w it w o r k s. To
cre a te a Se r ies, w e p ass a list o r N u m Py
a rr ay in t o t h e Se r ies o b je c t w h e n w e
in s tant i a te it :

s1 =
p d .Se ries([1 ,2] ) s1
Series o b j e ct

• A Se r ies can co n tain any t y p e o f d ata,


inclu d in g m ix ed t y p es. H e r e, w e cr e a te
a Se r ies th a t co n tain s s t rin g o b j e c ts:

s2 = pd.Se r ies(["Sachin Te n dulk ar", "Rahul


D rav id "])
s2
Cr e a tin g D ata Fr a m e s

• W e can cre a te a D ata Fr a m e b y p assing


m u lt ip le Ser ies i n to th e D ataFr a m e
class.
• H e r e, w e p ass in t h e t w o Se ries o b jects
w e just crea te d , s1 as th e f i r s t r o w , a n d
s2 as t h e seco n d r o w :

pd .D a taFram e ( [s1,s2 ])
Cr e a tin g D ata Fr a m e s

• W e can also acco m pl i s h th e sa m e th i n g w i t h a list


o f lists . Each inn e r list is t r e a te d as a r o w in th e
re su l t i n g D ata Fr am e :

p d.D a t aFram e (
[
[ 1 ,2 ],
[" Sachin
Te ndulk ar" ,
"Rahul
D ravid "]
]
Cr e a tin g D ata Fr a m e s

p d .Da taFr am e(
[
[ 1 ,2] ,
[ "Sach in
Te n d u lk ar
", "Rahu l
D r av id " ]
],
co l u m ns =
Cr e a tin g D ata Fr a m e s

p d .D at aFram e(
[
[ 1 ,2 ] ,
[ " Sach in
Te n d u l k ar
", " Rah u l
D ravid " ]
],
co l u m n s = [ " firs t " ," seco n d " ] ,
D a taFra m e m e th o ds

• A s w e m e n t io n e d ear lie r, each co lu m n in a D at aFr am e


is a Ser ie s o b ject :

t y p e (m ["nam e "])
pandas.co re .se r ies.Se r ies

• W e can call m o st o f t h e sam e m e t h o d s o n a Ser ie s


o b ject t h at w e can o n a D at aFr am e , in clu d in g head :

m [" nam e "].he a d ( )



D a taFra m e m e th o ds

• Pan d as Ser ies an d Dat aFr am es also h ave o t h er m eth o d s t h at


m ake calcu la t io ns sim p le r. Fo r ex am p le, w e can use t h e
p an d as.Ser ies.m ean m et h o d t o f i n d t h e m ean o f a Ser ies:

m ["ra ting"].mean( )
3.52 00 000 000 0 000 05

• W e can also call t h e sim ilar p anda s.Data Fr am e.m ean m et h o d ,


w h ich w ill f i n d t h e m ean o f each n u m er ical co lu m n in a
Dat aFr am e b y d e f au l t:

m .m ean ( )
D a taFra m e m e th o ds

• W e can m o d i f y t h e axis keyw o r d a rg u m e n t t o m e an


in o r d er t o co m p u t e t h e m e an o f e ach r o w o r o f
each co lu m n .
• By d e f aul t , axis is e q u al t o 0 , and w ill co m p u t e t h e
m e an o f e ach co lu m n . W e can also set i t t o 1 t o
co m p u t e t h e m ean o f each r o w . N o t e t h a t t h is w i l l
o n ly co m p u t e t h e m e an o f t h e n u m er ical valu e s in
each r o w :

m . m ean(a x is=1)
D a taFra m e m e th o ds

• Th e r e ar e q u it e a f e w m e t h o d s o n Ser ie s an d D at aFr am e s t h a t
b e h ave like m e an. H e r e ar e so m e h an d y o n e s:
– p an d as.Dat aFr am e .co r r – f i n d s t h e co r r e lat io n b e t w e e n co lu m n s
in a D at aFr am e .
– p an d as.Dat aFr am e .co u n t – co u n t s t h e n u m b e r o f n o n -nu ll
valu e s in e ach D at aFr am e co lu m n .
– p an d as.Dat aFr am e .m a x – f i n d s t h e hig h e st valu e in e ach
co lu m n .
– p an d as.Dat aFr am e .m in – f i n d s t h e lo w e st valu e in e ach
co l u m n .
– p an d as.Dat aFr am e .m e d ian – f i n d s t h e m e d ian o f e ach
co lu m n .
– p an d as.Dat aFr am e .st d – f i n d s t h e st an d a rd d e viat io n o f e ach
co l u m n .
D a taFra m e w it h Pandas

• W e can also p er f o r m m at h o p e r at i o n s o n Se r ie s o r
D at aFr am e o b je cts.
• Fo r ex am p le, w e can d iv id e e ve r y valu e in th e d u r at i o n
co lu m n b y 2 t o sw i t ch t h e scale f r o m 0 -10 t o 0 -5 :

m ["ra t ing"]/ 2

• A ll t h e co m m o n m at h em atic al o p er at o r s t h at w o r k in
Pyt h o n , like +, -, *, /, an d * * w ill w o r k , an d w ill ap p ly
t o e ach ele m e n t in a D ata Fr am e o r a Ser ie s.
B oo lean i n d e xin g

r _ f il t er = m [ " r at in g " ] > 3 .7


r _ f il t er

f m = m [r _ f il t e r]
fm
M u ltip le f i lt e r i n g

f i l t e r 1 = (m [" r at in g " ] > 3 .6) & (m ["y e ar " ] > 1 9 9 0 )


filter1

m [f i l t er 1 ]

You might also like