Jupyter Notebook Viewer1
Jupyter Notebook Viewer1
Pandas
Pandas contains high level data structures and manipulation tools to make data analysis fast and easy in
Python.
In [2]:
Series
Series is a one-dimensional array like object containing an array of data(any Numpy data type, and an
associated array of data labels, called its index.
In [13]:
0 5
1 4
2 3
3 2
4 1
dtype: int64
[5 4 3 2 1]
In [14]:
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 1/17
9/8/2018 Jupyter Notebook Viewer
In [27]:
a 5
b 4
c 3
d 2
e 1
f -7
h -29
dtype: int64
5
In [28]:
a 5
b 4
c 3
d 9
e 1
f -7
h -29
dtype: int64
Out[28]:
a 5
b 4
c 3
dtype: int64
In [31]:
a 5
b 4
c 3
d 9
e 1
dtype: int64
a 10
b 8
c 6
d 18
e 2
f -14
h -58
dtype: int64
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 2/17
9/8/2018 Jupyter Notebook Viewer
In [34]:
import numpy as np
np.mean(jeeva) # you can apply numpy functions to a Series
Out[34]:
-2.0
In [37]:
print 'b' in jeeva # checks whether the index is present in Series or not
print 'z' in jeeva
True
False
In [46]:
Fabregas 40000
Messi 75000
Ronaldo 85000
Rooney 50000
Van persie 67000
dtype: int64
In [49]:
Klose NaN
Messi 75000
Ronaldo 85000
Van persie 67000
Ballack NaN
dtype: float64
In [53]:
Out[53]:
Klose True
Messi False
Ronaldo False
Van persie False
Ballack True
dtype: bool
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 3/17
9/8/2018 Jupyter Notebook Viewer
In [52]:
Out[52]:
Klose False
Messi True
Ronaldo True
Van persie True
Ballack False
dtype: bool
In [64]:
Out[64]:
Player names
Klose NaN
Messi 75000
Ronaldo 85000
Van persie 67000
Ballack NaN
Name: Bundesliga players, dtype: float64
In [67]:
Out[67]:
Neymar NaN
Hulk 75000
Pirlo 85000
Buffon 67000
Anderson NaN
Name: Bundesliga players, dtype: float64
Data Frame
Data frame is a spread sheet like structure, containing ordered collection of columns. Each column can
have different value type. Data frame has both row index and column index.
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 4/17
9/8/2018 Jupyter Notebook Viewer
In [74]:
Out[74]:
0 Gujarati 36 Gujarat
2 Telugu 67 Andhra
3 Kannada 89 Karnataka
4 Malayalam 34 Kerala
In [75]:
Out[75]:
0 Gujarat Gujarati 36
2 Andhra Telugu 67
3 Karnataka Kannada 89
4 Kerala Malayalam 34
In [82]:
In [86]:
print new_farme.columns
print new_farme['State'] # retrieveing data like dictionary
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 5/17
9/8/2018 Jupyter Notebook Viewer
In [89]:
Out[89]:
a 36
b 44
c 67
d 89
e 34
Name: Population, dtype: int64
In [91]:
Out[91]:
State Karnataka
Language Kannada
Population 89
Per Capita Income NaN
Name: d, dtype: object
In [94]:
new_farme
Out[94]:
In [97]:
new_farme['Per Capita Income'] = 99 # the empty per capita income column can be assign
new_farme
Out[97]:
a Gujarat Gujarati 36 99
c Andhra Telugu 67 99
d Karnataka Kannada 89 99
e Kerala Malayalam 34 99
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 6/17
9/8/2018 Jupyter Notebook Viewer
In [99]:
Out[99]:
a Gujarat Gujarati 36 0
c Andhra Telugu 67 2
d Karnataka Kannada 89 3
e Kerala Malayalam 34 4
In [104]:
Out[104]:
c Andhra Telugu 67 33
d Karnataka Kannada 89 22
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 7/17
9/8/2018 Jupyter Notebook Viewer
In [119]:
Out[119]:
c Andhra Telugu 67 33
d Karnataka Kannada 89 22
In [16]:
new_data ={'Modi': {2010: 72, 2012: 78, 2014 : 98},'Rahul': {2010: 55, 2012: 34, 2014:
elections = DataFrame(new_data)
print elections# the outer dict keys are columns and inner dict keys are rows
elections.T # transpose of a data frame
Modi Rahul
2010 72 55
2012 78 34
2014 98 22
Out[16]:
Modi 72 78 98
Rahul 55 34 22
In [17]:
DataFrame(new_data, index =[2012, 2014, 2016]) # you can assign index for the data fra
Out[17]:
Modi Rahul
2012 78 34
2014 98 22
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 8/17
9/8/2018 Jupyter Notebook Viewer
In [18]:
Out[18]:
Gujarat India
2010 72 55
2012 78 34
In [150]:
Out[150]:
In [155]:
px.index.name = 'year'
px.columns.name = 'politicians'
px
Out[155]:
year
2010 72 55
2012 78 34
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 9/17
9/8/2018 Jupyter Notebook Viewer
In [156]:
px.values
Out[156]:
array([[72, 55],
[78, 34]], dtype=int64)
In [3]:
--------------------------------------------------------------------------
TypeError Traceback (most recent call last
<ipython-input-3-e8b7ee2d0552> in <module>()
3 print index #u denotes unicode
4 print index[1:]# returns all the index elements except a.
----> 5 index[1] = 'f' # you cannot modify an index element. It will gener
C:\Users\tk\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas
177 """This method will not function because object is immutab
178 raise TypeError("'%s' does not support mutable operations.
--> 179 self.__class__)
180
181 __setitem__ = __setslice__ = __delitem__ = __delslice__ = _dis
In [22]:
print px
2013 in px.index # checks if 2003 is an index in data frame px
Gujarat India
2010 72 55
2012 78 34
Out[22]:
False
Reindex
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 10/17
9/8/2018 Jupyter Notebook Viewer
In [27]:
5 Python
4 Java
3 c
2 c++
1 Php
dtype: object
1 Php
2 c++
3 c
4 Java
5 Python
dtype: object
In [28]:
Out[28]:
1 Php
2 c++
3 c
4 Java
5 Python
6 NaN
7 NaN
dtype: object
In [31]:
var.reindex([1,2,3,4,5,6,7], fill_value =1) # you can use fill value to fill the Nan v
Out[31]:
1 Php
2 c++
3 c
4 Java
5 Python
6 1
7 1
dtype: object
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 11/17
9/8/2018 Jupyter Notebook Viewer
In [35]:
0 Dhoni
2 Sachin
4 Kohli
dtype: object
Out[35]:
0 Dhoni
1 Dhoni
2 Sachin
3 Sachin
4 Kohli
5 Kohli
dtype: object
In [36]:
Out[36]:
0 Dhoni
1 Sachin
2 Sachin
3 Kohli
4 Kohli
5 NaN
dtype: object
In [45]:
import numpy as np
fp = DataFrame(np.arange(9).reshape((3,3)),index =['a','b','c'], columns =['Gujarat','
fp
Out[45]:
a 0 1 2
b 3 4 5
c 6 7 8
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 12/17
9/8/2018 Jupyter Notebook Viewer
In [55]:
fp1 =fp.reindex(['a', 'b', 'c', 'd'], columns = states) # reindexing columns and indic
fp1
Out[55]:
a 0 NaN 2
b 3 NaN 5
c 6 NaN 8
a 0
b 1
c 2
d 3
e 4
dtype: int32
Out[62]:
c 2
d 3
e 4
dtype: int32
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 13/17
9/8/2018 Jupyter Notebook Viewer
In [77]:
2 Andhra 67 Telugu
3 Karnataka 89 Kannada
4 Kerala 34 Malayalam
In [82]:
Out[82]:
Language
0 Gujarati
1 Tamil
2 Telugu
3 Kannada
4 Malayalam
Out[102]:
5 Python
4 Java
3 c
2 c++
1 Php
dtype: object
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 14/17
9/8/2018 Jupyter Notebook Viewer
In [103]:
print var[5]
print var[2:4]
Python
3 c
2 c++
dtype: object
In [104]:
var[[3,2,1]]
Out[104]:
3 c
2 c++
1 Php
dtype: object
In [109]:
var[var == 'Php']
Out[109]:
1 Php
dtype: object
In [111]:
Out[111]:
0 Gujarat 36 Gujarati
2 Andhra 67 Telugu
3 Karnataka 89 Kannada
4 Kerala 34 Malayalam
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 15/17
9/8/2018 Jupyter Notebook Viewer
In [114]:
Out[114]:
Population Language
0 36 Gujarati
1 44 Tamil
2 67 Telugu
3 89 Kannada
4 34 Malayalam
In [115]:
Out[115]:
2 Andhra 67 Telugu
3 Karnataka 89 Kannada
In [117]:
Out[117]:
0 Gujarat 36 Gujarati
2 Andhra 67 Telugu
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 16/17
9/8/2018 Jupyter Notebook Viewer
In [4]:
# for selecting specific rows and columns, you can use ix function
import pandas as pd
states ={'State' :['Gujarat', 'Tamil Nadu', ' Andhra', 'Karnataka', 'Kerala'],
'Population': [36, 44, 67,89,34],
'Language' :['Gujarati', 'Tamil', 'Telugu', 'Kannada', 'Malayalam']}
india = DataFrame(states, columns =['State', 'Population', 'Language'], index =['a', '
india
Out[4]:
a Gujarat 36 Gujarati
c Andhra 67 Telugu
d Karnataka 89 Kannada
e Kerala 34 Malayalam
In [128]:
Out[128]:
State Language
a Gujarat Gujarati
http://nbviewer.jupyter.org/gist/manujeevanprakash/996d18985be612072ee0 17/17