0% found this document useful (0 votes)
3 views13 pages

Python Pandas - Series Notes

Pandas is an open-source Python library created in 2008 for high-performance data manipulation and analysis, featuring data structures like Series and DataFrame. Series is a one-dimensional labeled array, while DataFrame is a two-dimensional structure that can hold heterogeneous data types. The library provides various functions for data operations, including indexing, slicing, and mathematical operations.

Uploaded by

Daniel Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

Python Pandas - Series Notes

Pandas is an open-source Python library created in 2008 for high-performance data manipulation and analysis, featuring data structures like Series and DataFrame. Series is a one-dimensional labeled array, while DataFrame is a two-dimensional structure that can hold heterogeneous data types. The library provides various functions for data operations, including indexing, slicing, and mathematical operations.

Uploaded by

Daniel Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Python Pandas

The term "Pandas" refers to an open-source library for manipulating high-performance data in Python. It
was created in 2008 by Wes McKinney and is used for data analysis in Python. Pandas is an open-source
library that provides high-performance data manipulation in Python. Before Pandas, Python was able for
information planning, however it just offered restricted help for information investigation. As a result,
Pandas entered the picture and enhanced data analysis capabilities.
DataFrame and Series are the two data structures that Pandas provides for processing data.
The best way to think of these data structures is that the higher dimensional data structure is a container
of its lower dimensional data structure. For example, DataFrame is a container of Series, Panel is a
container of DataFrame. These data structures are discussed below
Python Pandas Series
A one-dimensional array capable of storing a variety of data types is how it is defined. The term "index"
refers to the row labels of a series. We can without much of a stretch believer the rundown, tuple, and
word reference into series utilizing "series' technique. Multiple columns cannot be included in a Series.
Only one parameter exists:
Data: It can be any list, dictionary, or scalar value.
Key Points
• Homogeneous data
• Size Immutable
• Values of Data Mutable

Python Pandas DataFrame


It is a generally utilized information design of pandas and works with a two-layered exhibit with named
tomahawks (lines and segments). As a standard method for storing data, DataFrame has two distinct
indexes-row index and column index. It has the following characteristics:
The sections can be heterogeneous sorts like int, bool, etc.

• Heterogeneous data
• Size Mutable
• Data Mutable
Series: Creation of series from NDArray, Dictionary, Scaler
values
Series
A Pandas Series is a one-dimensional labeled ndarray structure. A Pandas Series can be thought of as a
column in a spreadsheet. It consists of two main components: the labels and the data.

For example
0 'Nirmal'
1 20
2 5.3
3 False
dtype: object

Here, the series has two columns, labels (0, 1, 2 and 3) and data ('nirmal', 20, 5.3, False).

The labels are the index values assigned to each data point, while the data represents the actual values
stored in the Series.

Note: Pandas Series can store homogeneous data elements. It uses a concept called dtype (data type) to
manage and represent the underlying data in a Series.

Creating a Pandas Series


To create Series any of the following methods can be used. Make sure to import pandas library.
Creating an empty Series: Series() function of Pandas is used to create a series. A basic series, which
can be created, is an Empty Series.

# import pandas as pd
import pandas as pd
# Creating empty series
ser = pd.Series()
print(ser)
Output:
Series([], dtype: float64)

By default, the data type of Series is float.

Creating a series from array: In order to create a series from NumPy array, we have to import numpy
module and have to use array() function.
# import pandas as pd
import pandas as pd
# import numpy as np
import numpy as np
# simple array
data = np.array(['g', 'e', 'e', 'k', 's'])
ser = pd.Series(data)
print(ser)
Mathematical Operation on Series object
We can do arithmetic operations ( +, -, *, /) on more than one series objects.
The arithmetic operation is performed only on matching index. For non-matching
index it produces NaN values.

If data items of matching indexes are not compatible for the operation, it produces
NaN values as a result.
Program-1
import pandas as pd
S1 = pd.Series([12,23,34])
S2 = pd.Series([10,20,10])
print(“Addition of Series with matching indexes”)
print(S1 + S2)

Output –
Addition of Series with matching indexes
0 22
1 43
2 44
dtype: int64
Program-2
import pandas as pd
S1 = pd.Series([12,23,34,56])
S2 = pd.Series([10,20,10])
print(“Addition of Series of Different sizes”)
print(S1 + S2)

Output –
Addition of Series of Different sizes
0 22
1 43
2 44
3 NaN
dtype: int64

Program-3
import pandas as pd
S1 = pd.Series([12,23,34])
S2 = pd.Series([10,20,10],index=[‘a’,’b’,’c’])
print(“Addition of Series With Non Matching Index”)
print(S1 + S2)

Output –
Addition of Series with Non Matching Index
0 NaN
1 NaN
2 NaN
a NaN
b NaN
c NaN
dtype: float64

Program-4
What will be the output produced by the following programming statements-1 & 2?
import pandas as pd
S1=pd.Series (data=[31,41,51])
print(S1>40) -->Statement1
print(S1[S1>40]) -->Statement2
Output –
Statement-1
0 False
1 True
2 True
Statement-2
1 41
2 51

Summary
 Pandas Series is a one dimensional array like labeled structure.
 Series labels need not be unique but must be a hashable type.
 Homogenous – Series elements must be of the same data type.
 Size-immutable – Once created, the size of a Series object cannot be
changed.
 The series object supports both integer and label-based indexing and
provides various methods for performing operations involving the index.
 Series can be created using List, array, dictionary and scalar value.
Head function
The head function in Python displays the first five rows of the dataframe by default.
It takes in a single parameter: the number of rows. We can use this parameter to
display the number of rows of our choice.

Syntax of head function is defined as follows:


dataframe.head(N)

N refers to the number of rows. If no parameter is passed, the first five rows are
returned.

import pandas as pd
# Creating a dataframe
df = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball',
'Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing']})
print(df.head()) # By default
print('\n')
print(df.head(3)) # Printing first 3 rows
print('\n')
print(df.head(-2)) # Printing all except the last 2 rows

Sports
0 Football
1 Cricket
2 Baseball
3 Basketball
4 Tennis

Sports
0 Football
1 Cricket
2 Baseball

Sports
0 Football
1 Cricket
2 Baseball
3 Basketball
4 Tennis
5 Table-tennis
6 Archery
Tail function
The tail function in Python displays the last five rows of the dataframe by default. It
takes in a single parameter: the number of rows. We can use this parameter to
display the number of rows of our choice.

Syntax
The tail function is defined as follows:
dataframe.tail(N)

N refers to the number of rows. If no parameter is passed, the last five rows are
returned.
The tail function also supports negative values of N. In that case, all rows except
the first N rows are returned.
# Creating a dataframe
df = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball',
'Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing']})
print(df.tail()) # By default
print('\n')
print(df.(3)) tail# Printing last 3 rows
print('\n')
print(df.tail(-2)) # Printing all except the first 2 rows

Sports
4 Tennis
5 Table-tennis
6 Archery
7 Swimming
8 Boxing

Sports
6 Archery
7 Swimming
8 Boxing

Sports
2 Baseball
3 Basketball
4 Tennis
5 Table-tennis
6 Archery
7 Swimming
8 Boxing

Indexing/Slices from Series Object


A slice object is created from Series object using a syntax of <object>[Start : end :
step] but the start and stop signify the positions of elements not the indexes. The
slice object of a series object is also a panda Series type object.

Slicing takes place position wise and not the index wise in a series object

The index [] operator can be used to perform indexing and slicing operations on a
Series object. The index[] operator can accept either-Index/labels
Integer index positions
Using the index operator with labels-
The index operator can be used in the following ways-
Using a single label inside the square brackets- Using a single label/index inside
the square brackets will return only the corresponding element referred to by that
label/index.
Using multiple labels- We can pass multiple labels in any order that is present in
the Series object. The multiple labels must be passed as a list i.e. the multiple
labels must be separated by commas and enclosed in double square brackets.
Passing a label is passed that is not present in the Series object, should be avoided
as it right now gives NaN as the value but in future will be considered as an error
by Python.

# indexing a Series object multiple labels


import pandas as pd
d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}
s=pd.Series(d)
u=s[['b', 'a', 'f']]
print(u)

o/p:

b 102
a 101
f 106
dtype: int64

Using slice notation start label : end label-


Inside the index operator we can pass start label : end label. Here contrary to the
slice concept all the items from start label values till the end label values including
the end label values is returned back.
# indexing a Series object using startlabel : endlabel

import pandas as pd
d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}
s=pd.Series(d)
u=s['b':'e’]
print(u)
Output

b 102
c 103
d 104
e 105
dtype: int64
Slicing a Series object using Integer Index positions-
The concept of slicing a Series object is similar to that of slicing python lists, strings
etc. Even though the data type of the labels can be anything each element of the
Series object is associated with two integer numbers:

In forward indexing method the elements are numbered from 0,1,2,3, … with 0
being assigned to thefirst element, 1 being assigned to the second element and so
on.

In backward indexing method the elements are numbered from -1,-2, -3,
… with -1 being assigned tothe last element, -2 being assigned to the second last
element and so on.
d={'a':101, 'b':102, 'c':103, 'd':104, 'e':105, 'f':106}
s=pd.Series(d)
The Series object is having the following integer index positions-

Slice concept-
The basic concept of slicing using integer index positions is common to Python
object such as strings, list, tuples, Series, Dataframe etc. Slice creates a new object
using elements of an existing object. It is created as: ExistingObjectName[start :
stop : step] where start, stop , step are integers

# Slicing a Series object

import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}
s=pd.Series(d)
x=s[1: :2]
print('x=\n', x)
y=s[-1: :-1]
print('y=\n', y)
z=s[1: -2: 2]
print('z=\n', z)

Output
x=
b 111
d 131
f 151
dtype: int64
y=
f 151
e 141
d 131
c 121
b 111
a 101
dtype: int64
z=
b 111
d 131
dtype: int64
Modifying elements of Series object-
The elements of a Series object can be modified using any of the following
methods-
Using index [ ] operator to modify single/multiple values
# Modifying a Series object index [ ] method
import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}
a 777
b 111
c 555
d 131
e 141
f 666
dtype: int64 s
s=
a 777
0
1
2
e 141
f 666
dtype: int64

string at/iat property to modify a single value


# Modifying a Series object at/iat property
import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}
s=pd.Series(d)
s['c'] = 555
s[['f','a']] = [666,777]
print('s=\n', s)
s['b':'d']=[0,1,2]
print('s=\n', s)

Output s=
a 101
b 111
c 121
d 999
e 141
f 777
dtype : int64

Using loc, iloc property to modify single /multiple values


#Modifying a Series object loc/iloc property
import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}
s=pd.Series(d)
s.loc['b'] = 9
s.loc['e':'f'] = [8,7]
print('s=\n', s)
s.iloc[1: :2] = [33,44,55]
print('s=\n', s)

Output s=
a 101
b9
c 121
d 131
e8
f7
dtype: int64

s=
a 101
b 33
c 121
d 44
e8
f 55

e) Using slice method to modify multiple values


# Modifying a Series object slice method

import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131, 'e':141, 'f':151}
s=pd.Series(d)
s[1: :2] = [1,2,3]
print('s=\n', s)

Output s=
a 101
b1
c 121
d2
e 141
f3
dtype : int64

Changing indexes of Series object-


The index property can be used to change the indexes of a Series object import
pandas as pd

# Changing indexes of Series object


import pandas as pd
d={'a':101, 'b':111, 'c':121, 'd':131}
s=pd.Series(d)
s.index = ['have','a','nice', 'day']
print('s=\n', s)

Output
s=
have 101
A 111
Nice 121
Day 131
dtype: int64

You might also like