Data Handling Using Pandas-1 - Series Object Notes PDF
Data Handling Using Pandas-1 - Series Object Notes PDF
Fast and efficient DataFrame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of data sets.
Label-based slicing, indexing and creation of subset from large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
Time Series functionality.
Page 1 of 25
Why Pandas :
Read and write data in all data type formats ( integer , float , string etc )
Can easily select subsets of data from bulky data sets and even combine multiple
datasets together
ENVIRONMENT SETUP
Standard Python distribution doesn't come bundled with Pandas module. A lightweight
alternative is to install Pandas using popular Python package installer, pip in the
command prompt.
Series
DataFrame
Panel
These data structures are built on top of library ― Numpy‖.
The best way to think of these data structures is that the higher dimensional data structure
is a container of its lower dimensional data structure. For example, DataFrame is a
container of Series, Panel is a container of DataFrame.
MUTABILITY
All Pandas data structures are value mutable (can be changed) and except Series all are
size mutable. Series is size immutable.
Note − DataFrame is widely used and one of the most important data structures. Panel
is used much less.
SERIES
The first data structure in Pandas that we are going to see is the Series. They are
homogeneous one-dimensional objects, that is, all data are of the same type and are
implicitly labeled with an index.
Page 3 of 25
10 23 56 17 52 61 73 90 26 72
Key Points
Homogeneous data
Size Immutable
Value / Data Mutable
DATAFRAME
DataFrame is a two-dimensional array with heterogeneous data. For example,
The table represents the data of a sales team of an organization with their overall
performance rating. The data is represented in rows and columns. Each column
represents an attribute and each row represents a person.
Column Type
Name String
Page 4 of 25
Age Integer
Gender String
Rating Float
Key Points
Heterogeneous data
Size Mutable
Data Mutable
PANEL
Panel is a three-dimensional data structure with heterogeneous data. It is hard to
represent the panel in graphical representation. But a panel can be illustrated as a
container of DataFrame.
Key Points
Heterogeneous data
Size Mutable
Data Mutable
Page 5 of 25
Index Data
0 10
1 15
2 18
3 22
1 data -data takes various forms like ndarray, list, dictionary , scalar values etc
PROGRAM OUTPUT
# importing the pandas library as pd The Series is :
import pandas as pd Series([], dtype: float64)
s = pd.Series( )
print("The Series is : ")
print(s)
Page 6 of 25
CREATION OF NON EMPTY SERIES :
For creating non empty series, the user should specify arguments for data and index
as per requirement.
Q2 . Write a program to create a Series object “s” using the python sequence [1,2,3,4,5,6,7 ]
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
0 1
import pandas as pd
1 2
lis = [1, 2, 3, 4, 5, 6, 7]
2 3
print(" The Series is : ")
3 4
s = pd.Series(lis)
4 5
print(s)
5 6
# in this program , the source of the series 6 7
‘s’ is from a list named as ‘lis’. dtype: int64
The index in the above series was auto generated in form of 0,1,…6 and the data was given
through the list in form of 1 ,2….7 , this auto generated index is known as positional index.
Positional index takes an integer value that corresponds to its position in the series starting from 0,
whereas labelled index takes any user-defined label as index.
Page 7 of 25
Q3 . Write a program to create a Series object “s” using the python sequence (1,2,3,4,5,6,7)
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd 0 1
tup = (1, 2, 3, 4, 5, 6, 7) 1 2
print(" The Series is : ") 2 3
s = pd.Series(tup) 3 4
print(s) 4 5
# in this program , the source of the series 5 6
‘s’ is from a tuple named as ‘tup’. 6 7
dtype: int64
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
0 a
import pandas as pd
1 b
lis=['a', 'b', 'c', 'd', 'e']
2 c
print(" The Series is : ")
s = pd.Series(lis) 3 d
print(s) 4 e
Page 8 of 25
Q5 . Write a program to create a Series object “s” which will be consisting of integer numbers
upto the number 9 (ie) 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd 0 0
1 1
print(" The Series is :")
2 2
s = pd.Series(range(10)) 3 3
print(s) 4 4
5 5
# in this program , the source of the series 6 6
‘s’ is ‘range’ function. 7 7
Note : The above program can be written using 8 8
list / tuple also by storing the numbers 0 , 1 , 2 , 3 9 9
, 4 , 5 , 6 , 7 , 8 , 9 in it ,instead of using range(10) dtype: int64
Q6 . Write a program to create a Series object “s” which will be consisting of integer numbers
1 , 3 , 5 , 7 , 9 , 11 , 13 , 15 , 17 , 19 , 21 , 23
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
0 1
import pandas as pd 1 3
2 5
3 7
print(" The Series is :")
4 9
s = pd.Series(range(1 , 25 ,2))
5 11
print(s) 6 13
7 15
# in this program , the source of the series 8 17
‘s’ is ‘range’ function. 9 19
10 21
11 23
dtype: int64
Page 9 of 25
Q7 . Write a program to create a Series object “s” which will be consisting of the string
“INFORMATICS PRACTICES”
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
0 INFORMATICS PRACTICES
import pandas as pd dtype: object
Q8 . Write a program to create a Series object “s” which will be consisting of list of strings
“ INFORMATICS ” , “ PRACTICES”
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
0 INFORMATICS
import pandas as pd 1 PRACTICES
lis = [“ INFORMATICS ” , “ PRACTICES”] dtype: object
print(" The Series is :")
s = pd.Series( lis)
print(s)
In all the above examples, we have allowed the index labels to appear by
default
Without explicitly programming it , the index label starts with 0.
But we can also specify the index as per our requirement.
Page 10 of 25
Q9 . Write a program to create a Series object “s” which will be consisting of integer numbers
1 , 3 , 5 , 7 , 9 with the index numbers 1 , 2 , 3 , 4 , 5 ( user assigned index / customized
index)
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
1 1
import pandas as pd 2 3
3 5
4 7
print(" The Series is :")
5 9
s = pd.Series(range(1,10,2), index = [1,2,3,4,5])
dtype: int64
print(s)
THINK BEYOND
PROGRAM OUTPUT
#import the pandas library and aliasing as pd Error :
import pandas as pd
ValueError: Length of passed
print(" The Series is :") values is 5, index implies 4
s = pd.Series(range(1,10,2), index = [1,2,3,4])
print(s)
NUMPY in PYTHON
NumPy stands for ‗Numerical Python‘. It is a package for data analysis and scientific
computing with Python. NumPy uses a multidimensional array object, and has functions
and tools for working with these arrays. The powerful n-dimensional array in NumPy
speeds-up data processing.
Installing NumPy
NumPy can be installed by typing following command:
pip install numpy
Page 11 of 25
Array :
An array is a data type used to store multiple values using a single identifier (variable
name).
An array contains an ordered collection of data elements where each element is of the
same type and can be referenced by its index (position).
The important characteristics of an array are:
• Each element of the array is of same data type, though the values stored in them may
be different.
• The entire array is stored contiguously in memory. This makes operations on array fast.
• Each element of the array is identified or referred using the name of the Array along
with the index of that element, which is unique for each element. The index of an element
is an
integral value associated with the element, based on the element‘s position in the array.
For example consider an array with 5 numbers:
[10, 9, 99, 71, 90]
Here, the 1st value in the array is 10 and has the index value [0] associated with it; the
2nd value in the array is 9 and has the index value [1] associated with it, and so on. The
last value (in this case the 5th value) in this array has an index [4]. This is called zero
based indexing. This is very similar to the indexing of lists in Python. The idea of arrays
is so important that almost all programming languages support it in one form or another.
NumPy Array :
NumPy arrays are used to store lists of numerical data, vectors and matrices. The NumPy
library has a large set of routines (built-in functions) for creating, manipulating,and
transforming NumPy arrays.The NumPy array is officially called ndarray but commonly
known as array.
Page 12 of 25
Note :
PROGRAM OUTPUT
from numpy import random
x = random.rand()
print(x)
import numpy as np
r = np.random.rand()
print(r)
import numpy as np
r = np.random.rand(4)
print(r)
# np.random.rand(4) generates
4 random numbers between 0
and 1.
import numpy as np
for i in range(1,6):
r = np.random.rand()
print(r)
# np.random.rand() generates
Page 13 of 25
a random number between 0
and 1 for each and every
execution of the loop.
import numpy as np
for i in range(1,6):
r = np.random.rand(3)
print(r)
# np.random.rand(3) generates
3 random numbers between 0
and 1 for each and every
execution of the loop.
The randint() method returns an integer number from the specified range.
It returns an array of specified shape and fills it with random integers from low
(inclusive) to high (exclusive)
PROGRAM OUTPUT
import numpy as np
x=np.random.randint(5)
print(x)
# generates a random number b/w 0 and
5, automatically ‘0’ is taken as low & 5
is the given input for high.
x=np.random.randint(5, size=3)
print(x)
# generates 3 random random numbers
b/w 0 and 5 and displays in array form .
Page 14 of 25
for i in range(1,5):
x=np.random.randint(20,51)
print(x)
# program controlled by for loop and
generates 4 random numbers b/w 20
and 51.
x=np.random.randint(20,51,size=5)
print(x)
# generates 5 random numbers b/w 20
and 51 and displays in array form .
x=np.random.randint()
print(x)
import numpy as np [2 4 3]
x=np.random.randint(1,5,3)
print(x)
# 1 is taken as low value , 5 is taken as
high value and 3 is the number of
random integers that needs to be
generated.( size keyword is
optionalwhen all three parameters are
given. )
Page 15 of 25
Q10 . Write a program to create a Series object “s” which will be consisting of 5 random float
data type numbers
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
and numpy as np
0 0.978711
import pandas as pd 1 0.752327
import numpy as np 2 0.207065
3 0.528203
print(" The Series is :")
4 0.144855
s = pd.Series(np.random.rand(5))
dtype: float64
print(s)
* rand( ) is a method that returns a random float number between 0 and 1.
Q11 . Write a program to create a Series object “s” that has 5 elements in it using a ndarray
in the range 25 to 66
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
and numpy as np
0 25
import pandas as pd
1 35
import numpy as np
2 45
print(" The Series is :")
3 55
s = pd.Series(np.arange(25,66,10))
4 65
print(s)
dtype: int32
Page 16 of 25
Q12 . Write a program to create a Series object “s” that has 5 elements in it using a
ndarray in the range 25 to 66
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
and numpy as np
0 25.00
import pandas as pd 1 35.25
import numpy as np 2 45.50
3 55.75
print(" The Series is :")
4 66.00
s = pd.Series(np.linspace(25, 66, 5))
dtype: float64
print(s)
PROGRAM OUTPUT
#import the pandas library and aliasing The Series is
as pd
0 25.00
import pandas as pd 1 35.25
import numpy as np 2 45.50
3 55.75
s = pd.Series(np.linspace(25, 66 ,5 ))
4 66.00
print("The Series is ")
dtype: float64
print(s)
Page 17 of 25
Q13 . Write a program to create a Series object “s” using a ndarray that is
created by tiling a list [ 67,78 ] three times.
PROGRAM OUTPUT
#import the pandas library and aliasing as pd and The Series is :
numpy as np
0 67
1 78
import pandas as pd
2 67
import numpy as np 3 78
print(" The Series is :") 4 67
s= pd.Series(np.tile([67,78],3)) 5 78
print(s) dtype: int32
Q14 . Write a program to create a Series object “s” using a ndarray that generates 5
random float numbers between 0 and 1 with customized index 'a', 'b', 'c', 'd', 'e'.
PROGRAM OUTPUT
#import the pandas library and aliasing as pd and numpy as np The Series is :
import pandas as pd a 0.769126
import numpy as np b 0.976503
c 0.577741
print(" The Series is :") d 0.312644
s = pd.Series(np.random.rand(5),index=['a', 'b', 'c', 'd', 'e']) e 0.905229
print(s) dtype: float64
Page 18 of 25
CREATION OF SERIES FROM PYTHON DICTIONARY
Recall that Python dictionary has key: value pairs and a value can be quickly retrieved
when its key is known.
When a series object is created from dictionary, then keys of the dictionary becomes
the index of the Series and the values of the dictionary becomes the data of the Series
object.
Q15 . Write a program to create a Series object “s” using a dictionary dic = {'a' : 0.0, 'b' :
1.0, 'c' : 2.0}
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
a 0.0
import pandas as pd
b 1.0
dic = {'a' : 0.0, 'b' : 1.0, 'c' : 2.0}
c 2.0
print("The Series is :")
dtype: float64
s = pd.Series(dic)
print(s)
Q16 . Write a program to create a Series object “s” using a dictionary dic = {'b' : 0.0, 'c' :
1.0, 'a' : 2.0}
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
b 0.0
dic = {'b' : 0.0, 'c' : 1.0, 'a' : 2.0} c 1.0
print("The Series is :") a 2.0
s = pd.Series(dic) dtype: float64
print(s)
Page 19 of 25
Q17 . Write a program to create a Series object “s” using a dictionary dic = {'a' : 0., 'b' : 1.,
'c' : 2.}by display in the order 'b','c','d','a'
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd b 1.0
Note : In the above example , the user has assigned index as per requirement in the
'b','c','d','a' , where as in the dictionary we have only 3 values for the index ( a, b and
c ) . In such situation pandas will automatically fill the non available value as NaN.
NaN stands for Not a Number and is defined as empty value in Numpy module . To
specify a missing value use np.NaN or None
Page 20 of 25
CREATION OF SERIES FROM SCALAR VALUE
A scalar value refers to a single value such as 5, 3.14, 'info' etc…
import pandas as pd 0 5
import pandas as pd 0 5
Q18 . Write a program to create a Series object “s” as per the following situation :
Total no of medals to be won is 200 in Inter University games held every alternate year.
Create a series that stores medals for games to be held in 2020-2029.
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd 2020 200
2022 200
print(" The Series is : ") 2024 200
s = pd.Series(200 , range(2020 , 2029 , 2)) 2026 200
print(s) 2028 200
dtype: int64
Q19 . Write a program to create a series object that stores the initial budget allocated
(50000/- each) for the four quarters of the year: Qtr1, Qtr2, Qtr3 and Qtr4
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd Qtr1 50000
Qtr2 50000
print(" The Series is : ") Qtr3 50000
s = pd.Series(50000 ,[―Qtr1‖ ,―Qtr2‖, ―Qtr3‖, ―Qtr4‖]) Qtr4 50000
print(s) dtype: int64
Page 22 of 25
USING LOOP TO CREATE DATA & INDEX IN SERIES
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd 0 5
print(" The Series is : ") 2 5
s = pd.Series(5, index=[ x for x in range(0, 10,2)]) 4 5
print(s) 6 5
8 5
dtype: int64
Q20 . A list namely section stores the section names A,B,C,D of class 12. Another list
contri stores the contribution made by these students to a charity fund
endorsed by the school. Write code to create a Series object that stores the
contribution amount as the values and the section names as the indexes.
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd 12 A 10000
contri = [10000,12000,10000,11000] 12 B 12000
sec = [ ―12 A‖, ―12 B‖, ―12 C‖ , ―12 D‖] 12 C 10000
print(" The Series is : ") 12 D 11000
s = pd.Series( contri , sec ) dtype: int64
print(s)
Page 23 of 25
Even though the list named contri contains integer values,
why the data type of the series is shown as float32
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd 12 A 10000.0
import numpy as np 12 B 12000.0
contri = [10000,12000,10000,11000] 12 C 10000.0
sec = [ “12 A”, “12 B”, “12 C” , “12 D”] 12 D 11000.0
print(" The Series is : ") dtype: float32
s = pd.Series( contri , sec , dtype=np.float32 )
print(s)
Q21 Sequences :section and contri store the section names 12 A,B,C,D,E
and contribution made is respectively (10000,12000,10000,11000,nil)
for a charity. Your school has decided to donate as much contribution
as made by each section ie the donation will be doubled.
Write code to create a series that stores the contribution amount as the
values and the section names as the indexes with datatype as float32.
Page 24 of 25
PROGRAM OUTPUT
import pandas as pd The Series is :
import numpy as np 12 A 20000.0
# contri = [10000,12000,10000,11000, None] 12 B 24000.0
contri = np.array([10000,12000,10000,11000, np.NaN]) 12 C 20000.0
sec = ["12 A", "12 B", "12 C" , "12 D" , "12 E"] 12 D 22000.0
print(" The Series is : ") 12 E NaN
s = pd.Series(data=contri*2 , index=sec, dtype=np.float32) dtype: float32
print(s)
PROGRAM OUTPUT
#import the pandas library and aliasing as pd The Series is :
import pandas as pd 12 A 20000
import numpy as np 12 B 20000
contri = 10000 12 C 20000
sec = ["12 A", "12 B", "12 C" , "12 D" ] 12 D 20000
print(" The Series is : ") dtype: int64
s = pd.Series(contri *2 , sec )
print(s)
Page 25 of 25