Python Pandas - Series Notes
Python Pandas - Series Notes
The term "Pandas" refers to an open-source library for manipulating high-performance data in Python. It
was created in 2008 by Wes McKinney and is used for data analysis in Python. Pandas is an open-source
library that provides high-performance data manipulation in Python. Before Pandas, Python was able for
information planning, however it just offered restricted help for information investigation. As a result,
Pandas entered the picture and enhanced data analysis capabilities.
DataFrame and Series are the two data structures that Pandas provides for processing data.
The best way to think of these data structures is that the higher dimensional data structure is a container
of its lower dimensional data structure. For example, DataFrame is a container of Series, Panel is a
container of DataFrame. These data structures are discussed below
Python Pandas Series
A one-dimensional array capable of storing a variety of data types is how it is defined. The term "index"
refers to the row labels of a series. We can without much of a stretch believer the rundown, tuple, and
word reference into series utilizing "series' technique. Multiple columns cannot be included in a Series.
Only one parameter exists:
Data: It can be any list, dictionary, or scalar value.
Key Points
• Homogeneous data
• Size Immutable
• Values of Data Mutable
• Heterogeneous data
• Size Mutable
• Data Mutable
Series: Creation of series from NDArray, Dictionary, Scaler
values
Series
A Pandas Series is a one-dimensional labeled ndarray structure. A Pandas Series can be thought of as a
column in a spreadsheet. It consists of two main components: the labels and the data.
For example
0 'Nirmal'
1 20
2 5.3
3 False
dtype: object
Here, the series has two columns, labels (0, 1, 2 and 3) and data ('nirmal', 20, 5.3, False).
The labels are the index values assigned to each data point, while the data represents the actual values
stored in the Series.
Note: Pandas Series can store homogeneous data elements. It uses a concept called dtype (data type) to
manage and represent the underlying data in a Series.
# import pandas as pd
import pandas as pd
# Creating empty series
ser = pd.Series()
print(ser)
Output:
Series([], dtype: float64)
Creating a series from array: In order to create a series from NumPy array, we have to import numpy
module and have to use array() function.
# import pandas as pd
import pandas as pd
# import numpy as np
import numpy as np
# simple array
data = np.array(['g', 'e', 'e', 'k', 's'])
ser = pd.Series(data)
print(ser)
Mathematical Operation on Series object
We can do arithmetic operations ( +, -, *, /) on more than one series objects.
The arithmetic operation is performed only on matching index. For non-matching
index it produces NaN values.
If data items of matching indexes are not compatible for the operation, it produces
NaN values as a result.
Program-1
import pandas as pd
S1 = pd.Series([12,23,34])
S2 = pd.Series([10,20,10])
print(“Addition of Series with matching indexes”)
print(S1 + S2)
Output –
Addition of Series with matching indexes
0 22
1 43
2 44
dtype: int64
Program-2
import pandas as pd
S1 = pd.Series([12,23,34,56])
S2 = pd.Series([10,20,10])
print(“Addition of Series of Different sizes”)
print(S1 + S2)
Output –
Addition of Series of Different sizes
0 22
1 43
2 44
3 NaN
dtype: int64
Program-3
import pandas as pd
S1 = pd.Series([12,23,34])
S2 = pd.Series([10,20,10],index=[‘a’,’b’,’c’])
print(“Addition of Series With Non Matching Index”)
print(S1 + S2)
Output –
Addition of Series with Non Matching Index
0 NaN
1 NaN
2 NaN
a NaN
b NaN
c NaN
dtype: float64
Program-4
What will be the output produced by the following programming statements-1 & 2?
import pandas as pd
S1=pd.Series (data=[31,41,51])
print(S1>40) -->Statement1
print(S1[S1>40]) -->Statement2
Output –
Statement-1
0 False
1 True
2 True
Statement-2
1 41
2 51
Summary
Pandas Series is a one dimensional array like labeled structure.
Series labels need not be unique but must be a hashable type.
Homogenous – Series elements must be of the same data type.
Size-immutable – Once created, the size of a Series object cannot be
changed.
The series object supports both integer and label-based indexing and
provides various methods for performing operations involving the index.
Series can be created using List, array, dictionary and scalar value.
Head function
The head function in Python displays the first five rows of the dataframe by default.
It takes in a single parameter: the number of rows. We can use this parameter to
display the number of rows of our choice.
N refers to the number of rows. If no parameter is passed, the first five rows are
returned.
import pandas as pd
# Creating a dataframe
df = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball',
'Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing']})
print(df.head()) # By default
print('\n')
print(df.head(3)) # Printing first 3 rows
print('\n')
print(df.head(-2)) # Printing all except the last 2 rows
Sports
0 Football
1 Cricket
2 Baseball
3 Basketball
4 Tennis
Sports
0 Football
1 Cricket
2 Baseball
Sports
0 Football
1 Cricket
2 Baseball
3 Basketball
4 Tennis
5 Table-tennis
6 Archery
Tail function
The tail function in Python displays the last five rows of the dataframe by default. It
takes in a single parameter: the number of rows. We can use this parameter to
display the number of rows of our choice.
Syntax
The tail function is defined as follows:
dataframe.tail(N)
N refers to the number of rows. If no parameter is passed, the last five rows are
returned.
The tail function also supports negative values of N. In that case, all rows except
the first N rows are returned.
# Creating a dataframe
df = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball',
'Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing']})
print(df.tail()) # By default
print('\n')
print(df.(3)) tail# Printing last 3 rows
print('\n')
print(df.tail(-2)) # Printing all except the first 2 rows
Sports
4 Tennis
5 Table-tennis
6 Archery
7 Swimming
8 Boxing
Sports
6 Archery
7 Swimming
8 Boxing
Sports
2 Baseball
3 Basketball
4 Tennis
5 Table-tennis
6 Archery
7 Swimming
8 Boxing
Slicing takes place position wise and not the index wise in a series object
The index [] operator can be used to perform indexing and slicing operations on a
Series object. The index[] operator can accept either-Index/labels
Integer index positions
Using the index operator with labels-
The index operator can be used in the following ways-
Using a single label inside the square brackets- Using a single label/index inside
the square brackets will return only the corresponding element referred to by that
label/index.
Using multiple labels- We can pass multiple labels in any order that is present in
the Series object. The multiple labels must be passed as a list i.e. the multiple
labels must be separated by commas and enclosed in double square brackets.
Passing a label is passed that is not present in the Series object, should be avoided
as it right now gives NaN as the value but in future will be considered as an error
by Python.
o/p:
b 102
a 101
f 106
dtype: int64
import pandas as pd
d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}
s=pd.Series(d)
u=s['b':'e’]
print(u)
Output
b 102
c 103
d 104
e 105
dtype: int64
Slicing a Series object using Integer Index positions-
The concept of slicing a Series object is similar to that of slicing python lists, strings
etc. Even though the data type of the labels can be anything each element of the
Series object is associated with two integer numbers:
In forward indexing method the elements are numbered from 0,1,2,3, … with 0
being assigned to thefirst element, 1 being assigned to the second element and so
on.
In backward indexing method the elements are numbered from -1,-2, -3,
… with -1 being assigned tothe last element, -2 being assigned to the second last
element and so on.
d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}
s=pd.Series(d)
The Series object is having the following integer index positions-
Slice concept-
The basic concept of slicing using integer index positions is common to Python
object such as strings, list, tuples, Series, Dataframe etc. Slice creates a new object
using elements of an existing object. It is created as: ExistingObjectName[start :
stop : step] where start, stop , step are integers
import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}
s=pd.Series(d)
x=s[1: :2]
print('x=\n', x)
y=s[-1: :-1]
print('y=\n', y)
z=s[1: -2: 2]
print('z=\n', z)
Output
x=
b 111
d 131
f 151
dtype: int64
y=
f 151
e 141
d 131
c 121
b 111
a 101
dtype: int64
z=
b 111
d 131
dtype: int64
Modifying elements of Series object-
The elements of a Series object can be modified using any of the following
methods-
Using index [ ] operator to modify single/multiple values
# Modifying a Series object index [ ] method
import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}
a 777
b 111
c 555
d 131
e 141
f 666
dtype: int64 s
s=
a 777
0
1
2
e 141
f 666
dtype: int64
Output s=
a 101
b 111
c 121
d 999
e 141
f 777
dtype : int64
Output s=
a 101
b9
c 121
d 131
e8
f7
dtype: int64
s=
a 101
b 33
c 121
d 44
e8
f 55
import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}
s=pd.Series(d)
s[1: :2] = [1,2,3]
print('s=\n', s)
Output s=
a 101
b1
c 121
d2
e 141
f3
dtype : int64
Output
s=
have 101
A 111
Nice 121
Day 131
dtype: int64