Chapter 2 Data Handling using pandas - I(Series)
Chapter 2 Data Handling using pandas - I(Series)
NumPy, Pandas and Matplotlib are three well-established Python libraries for scientific and
analytical use.
Installing Pandas
To install Pandas from command line, we need to type in:
pip install pandas.
Series
➢ one-dimensional array
➢ homogenous data
➢ containing a sequence of values with index
➢ sequence of values of any data type (int, float, list, string, etc)
➢ data is mutable
➢ size is immutable
The data label associated with a particular value is called its index
1
Creation of Series
A) Creation of Series from List
Program:
import pandas as pd
l=[10,20,30]
series1 = pd.Series(l)
# series1 = pd.Series([10,20,30])
print(series1)
Output:
0 10
1 20
2 30
dtype: int64
User-defined labels can be assigned to the index and use them to access elements of a Series.
Program:
import pandas as pd
series2 = pd.Series(["Kavi","Shyam"], index=[3,5])
print(series2)
Output:
3 Kavi
5 Shyam
dtype: object
2
print("Creation of empty Series")
s=pd.Series(dtype=int)
print(s)
Note: Below all statements also create empty Series
s=pd.Series()
s=pd.Series([],dtype=int)
s=pd.Series({},dtype=int)
s=pd.Series((),dtype=int)
Output:
Creation of empty Series
Series([], dtype: int32)
(c) Creation of Series from Scalar value
Program:
import pandas as pd
print("create a series from scalar value")
s4=pd.Series(25,index=[10,11,12])
print(s4)
Output:
create a series from scalar value
10 25
11 25
12 25
dtype: int64
(d) Creation of Series from dictionary
Keys of the dictionary will become indices in the series.
Program:
import pandas as pd
print("create a series from dictionary")
d={'a':'ant','b':'bat'}
s5=pd.Series(d)
print(s5)
Output:
create a series from dictionary
a ant
b bat
dtype: object
3
(e) Creation of Series from ndarray
Program:
import numpy as np
import pandas as pd
print("create a series from ndarray")
a=np.array([10,20,30])
s6=pd.Series(a)
print(s6)
Output:
create a series from ndarray
0 10
1 20
2 30
dtype: int32
OR
Program:
import numpy as np
import pandas as pd
print("create a series from ndarray using arange function")
b=np.arange(10,20,3)
s7=pd.Series(b)
print(s7)
Output:
create a series from ndarray
0 10
1 13
2 16
3 19
dtype: int32
4
Program:
import pandas as pd
s1=pd.Series([10,20,30,40,50],index=['I','II','III','IV','V'])
print(s1)
print("Assigning new index values")
s1.index=['one','two','three','four','five']
print(s1)
print("To access an element 30 using labelled indexing")
print(s1['three'])
print("To access the element 50 using postional indexing")
print(s1[4])
print("To access the element 20 and 40 using labelled indexing")
print(s1[['two','four']])
print("To access the element 20 and 40 using postional indexing")
print(s1[[1,3]])
print("To change value of an postional index 4")
s1[4]=55
print(s1)
Output:
I 10
II 20
III 30
IV 40
V 50
dtype: int64
Assigning new index values
one 10
two 20
three 30
four 40
five 50
dtype: int64
To access an element 30 using labelled indexing
30
To access the element 50 using postional indexing
5
50
To access the element 20 and 40 using labelled indexing
two 20
four 40
dtype: int64
To access the element 20 and 40 using postional indexing
two 20
four 40
dtype: int64
To change value of an postional index 4
one 10
two 20
three 30
four 40
five 55
dtype: int64
(B) Slicing
To extract a part of a series can be done through slicing. We can define which part of the series
is to be sliced by specifying the start and end parameters [start :end] with the series name.
When we use positional indices for slicing, the value at the end index position is excluded. If
labelled indexes are used for slicing, then value at the end index label is also included in the
output.
Program:
import pandas as pd
s1=pd.Series([10,20,30,40,55],index=['one','two','three','four','five'])
print("Positional index used for slicing")
print(s1[1:4])#excludes the value at index position 4
print("Labelled index used for slicing")
print(s1['one':'three'])
print("The series in reverse order")
print(s1[::-1])
print("To give same values for a given slice")
s1[1:4]=5
print(s1)
print("To give different values for a given slice")
6
s1[1:4]=[5,10,15]
print(s1)
Output:
Positional index used for slicing
two 20
three 30
four 40
dtype: int64
Labelled index used for slicing
one 10
two 20
three 30
dtype: int64
The series in reverse order
five 55
four 40
three 30
two 20
one 10
dtype: int64
To give same values for a given slice
one 10
two 5
three 5
four 5
five 55
dtype: int64
To give different values for a given slice
one 10
two 5
three 10
four 15
five 55
dtype: int64
7
Attributes of Series
Attribute Name Purpose
name assigns a name to the Series
index.name assigns a name to the index of the series
values prints a list of the values in the series
size prints the number of values in the Series object
empty prints True if the series is empty, and False otherwise
Program:
import pandas as pd
import numpy as np
s1=pd.Series({'a':np.NAN,'b':20,'c':30,'d':40})
print(s1)
s1.name='NIMS'
print(s1)
s1.index.name='Division'
print(s1)
print(s1.size)
print(s1.values)
print(s1.empty)
print(s1.count())
s2=pd.Series(dtype=int)
print(s2)
s2.name='Test'
print(s2)
s1.index.name='Result'
print(s2)
print(s2.size)
print(s2.values)
print(s2.empty)
print(s2.count())
Output:
a NaN
b 20.0
c 30.0
d 40.0
dtype: float64
a NaN
8
b 20.0
c 30.0
d 40.0
Name: NIMS, dtype: float64
Division
a NaN
b 20.0
c 30.0
d 40.0
Name: NIMS, dtype: float64
4
[nan 20. 30. 40.]
False
3
Series([], dtype: int32)
Series([], Name: Test, dtype: int32)
Series([], Name: Test, dtype: int32)
0
[]
True
0
Methods of Series
Method Explanation
Returns the first n members of the series. If the value for n is not passed, then
head(n)
by default n takes 5 and the first five members are displayed.
count() Returns the number of non-NaN values in the Series
Returns the last n members of the series. If the value for n is not passed, then
tail(n)
by default n takes 5 and the last five members are displayed.
Program:
import pandas as pd
s1=pd.Series([10,20,30,40,50,60,70,80,90])
print(s1.head())
print(s1.tail())
print(s1.head(2))
print(s1.tail(3))
9
Output:
0 10
1 20
2 30
3 40
4 50
dtype: int64
4 50
5 60
6 70
7 80
8 90
dtype: int64
0 10
1 20
dtype: int64
6 70
7 80
8 90
dtype: int64
10
(C) Multiplication of two Series
Again, it can be done in two different ways
s1*s2
s1.mul(s2,fill_value=10)
(D) Division of two Series
Again, it can be done in two different ways
s1/s2
s1.div(s2,fill_value=20)
Program:
import pandas as pd
s1=pd.Series([10,20,30])
s2=pd.Series([5,15,25,35])
print(s1+s2)
print(s1.add(s2,fill_value=40))
print(s1-s2)
print(s1.sub(s2,fill_value=40))
print(s1*s2)
print(s1.mul(s2,fill_value=40))
print(s1/s2)
print(s1.div(s2,fill_value=40))
Output:
0 15.0
1 35.0
2 55.0
3 NaN
dtype: float64
0 15.0
1 35.0
2 55.0
3 75.0
dtype: float64
0 5.0
1 5.0
2 5.0
3 NaN
dtype: float64
11
0 5.0
1 5.0
2 5.0
3 5.0
dtype: float64
0 50.0
1 300.0
2 750.0
3 NaN
dtype: float64
0 50.0
1 300.0
2 750.0
3 1400.0
dtype: float64
0 2.000000
1 1.333333
2 1.200000
3 NaN
dtype: float64
0 2.000000
1 1.333333
2 1.200000
3 1.142857
dtype: float64
12
a 10
e 20
i 30
o 40
u 50
dtype: int64
e 20
i 30
o 40
dtype: int64
a 10
e 20
i 30
dtype: int64
13