0% found this document useful (0 votes)
2 views

Dataframe Notes

The document provides an overview of DataFrames, a two-dimensional labeled data structure in Python's pandas library, including their creation from various data sources such as numpy arrays, lists of dictionaries, and series. It also highlights the differences between Series and DataFrames, emphasizing their dimensionality and data type flexibility. Additionally, it includes code examples demonstrating the creation of DataFrames with custom indices and column labels.

Uploaded by

Jayabharathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Dataframe Notes

The document provides an overview of DataFrames, a two-dimensional labeled data structure in Python's pandas library, including their creation from various data sources such as numpy arrays, lists of dictionaries, and series. It also highlights the differences between Series and DataFrames, emphasizing their dimensionality and data type flexibility. Additionally, it includes code examples demonstrating the creation of DataFrames with custom indices and column labels.

Uploaded by

Jayabharathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

DataFrame

Introduction to DataFrame
 A DataFrame is a two-dimensional labelled data structure like a spreadsheet.
 It contains rows and columns and therefore has both a row and column index.
 Each column can have a different type of values such a numeric, string,
Boolean,,etc., in tables of a database

Difference Between Series and DataFrame


Sno Property Series DataFrame
1 Dimensions 1-Dimensional 2-Dimensional
2 Type of Data Homogenous.,i.e. all Hetrogenous,i.e.
elements must be same DataFRame object can
datatype have elements of
different datatypes.
3 Mutability Values mutable,i.e. their Values mutable,i.e.
elements values can their elements values
change can change
Size Immutable,i.e. size Size mutable,i.e. size of
of Series object, once a DataFrame object,
created cant change.if once created can

1
you want to add or change in place. If you
remove an element want to add/remove an
internally a new Series element, it will change in
object will be created, existing DataFrame
object.

1.Creation of Empty DataFrame


CODE OUTPUT
import pandas as pd Empty DataFrame
df=pd.DataFrame() Columns: []
print(df) Index: []

2.Creation of DataFrame from numpy ndarrays


CODE OUTPUT
import pandas as pd 0
import numpy as np 0 11
arr=np.array([11,12,13,14]) 1 12
df=pd.DataFrame(arr) 2 13
print(df) 3 14
#Dataframe created from numpy array
is (1D)

3.Creation of DataFrame from numpy ndarrays


CODE OUTPUT
import pandas as pd 0 1 2 3
import numpy as np 0 11 12 13 14
arr=np.array([11,12,13,14]) 1 1 2 3 4
arr1=np.array([1,2,3,4]) 2 101 102 103 104
arr2=np.array([101,102,103,104])
df=pd.DataFrame([arr,arr1,arr2])
print(df)

2
4.Creation of DataFrame from numpy ndarrays with custom index label
CODE OUTPUT
import pandas as pd C1 C2 C3 C4
import numpy as np R1 11 12 13 14
arr=np.array([11,12,13,14]) R2 1 2 3 4
arr1=np.array([1,2,3,4]) R3 101 102 103 104
arr2=np.array([101,102,103,104])
df=pd.DataFrame([arr,arr1,arr2],index=['R1','R2','R3']
,
columns=['C1','C2','C3','C4'])
print(df)

5.Creation of DataFrame from List of Dictionaries

CODE OUTPUT
import pandas as pd Name Marks Age
D1={'Name':'Jaya','Marks':87} 0 Jaya 87 NaN
D2={'Name':'Abi','Age':17,'Marks':87} 1 Abi 87 17.0
D3={'Name':'Kavi','Age':18,'Marks':76} 2 Kavi 76 18.0
l=[D1,D2,D3]
df=pd.DataFrame(l)
print(df)

Note: Dictionary keys are become column labels by default in a DataFrame,and lists
become the rows.

6.Creation of DataFrame from List of Dictionaries with custom index value for
rows
CODE OUTPUT
import pandas as pd Name Marks Age
D1={'Name':'sai','Marks':87} R1 sai 87 NaN
D2={'Name':'Abi','Age':17,'Marks':87} R2 Abi 87 17.0
D3={'Name':'Kavi','Age':18,'Marks':76} R3 Kavi 76 18.0
l=[D1,D2,D3]
df=pd.DataFrame(l,index=['R1','R2','R3'])

3
print(df)
7.Creation of DataFrame from List of Dictionaries with custom index value for
columns
CODE OUTPUT
import pandas as pd a1 a2 a3
D1={'Name':'Jaya','Marks':87} R1 NaN NaN NaN
D2={'Name':'Abi','Age':17,'Marks':87} R2 NaN NaN NaN
D3={'Name':'Kavi','Age':18,'Marks':76} R3 NaN NaN NaN
L=[D1,D2,D3]
df=pd.DataFrame(L,index=['R1','R2','R3'],
columns=['a1','a2','a3'])
print(df)

8.Creation of DataFrame from List of Dictionaries with key values as column


but changing sequence
CODE OUTPUT
import pandas as pd Name Age Marks
D1={'Name':'Jaya','Marks':87} R1 Jaya NaN 87
D2={'Name':'Abi','Age':17,'Marks':87} R2 Abi 17.0 87
D3={'Name':'Kavi','Age':18,'Marks':76} R3 Kavi 18.0 76
l=[D1,D2,D3]
df=pd.DataFrame(l,index=['R1','R2','R3'],
columns=['Name','Age','Marks'])
print(df)

9.Creation of DataFrame from Dictionary of Lists


CODE OUTPUT
import pandas as pd Name Age Marks
N=['jaya','bala','krish'] 0 jaya 14 98
A=[14,17,15] 1 bala 17 78
M=[98,78,68] 2 krish 15 68
D={'Name':N,'Age':A,'Marks':M}
df=pd.DataFrame(D)
print(df)

4
10.Creation of DataFrame from Dictionary of List with custom index value for
rows
CODE OUTPUT
import pandas as pd Name Age Marks
N=['jaya','bala','krish'] R1 jaya 14 98
A=[14,17,15] R2 bala 17 78
M=[98,78,68] R3 krish 15 68
D={'Name':N,'Age':A,'Marks':M}
df=pd.DataFrame(D,index=['R1','R2','R3']
)
print(df)

11.Creation of DataFrame from Dictionary of List with custom index value for
columns
CODE OUTPUT
import pandas as pd a1 a2 a3
N=['jaya','bala','krish'] R1 NaN NaN NaN
A=[14,17,15] R2 NaN NaN NaN
M=[98,78,68] R3 NaN NaN NaN
D={'Name':N,'Age':A,'Marks':M}
df=pd.DataFrame(D,index=['R1','R2','R3'],
columns=['a1','a2','a3'])
print(df)

12.Creation of DataFrame from Dictionary of List with changing sequence of


column
CODE OUTPUT
import pandas as pd Marks Age Name
N=['jaya','bala','krish'] R1 98 14 jaya
A=[14,17,15] R2 78 17 bala
M=[98,78,68] R3 68 15 krish
D={'Name':N,'Age':A,'Marks':M}
df=pd.DataFrame(D,index=['R1','R2','R3'],
columns=['Marks','Age','Name'])

5
print(df)
13. Creation of DataFrame from Series(includes dtype)
 To create a DataFrame using more than on series, we need to pass multiple
Series in the list.
 The labels in the Series object become the column name in the Dataframe
object.
 Each Series becomes a row in the DataFrame.
 If a particular Series does not have a corresponding value for a label, NaN is
inserted in the DataFrame column.
CODE OUTPUT
import pandas as pd 0 14
L=[14,17,15] 1 17
s=pd.Series(L) 2 15
print(s) dtype: int64

14. Creation of DataFrame from Series(dtype not include)


CODE OUTPUT
import pandas as pd 0 1 2 3 4
s1=pd.Series([11,12,13,14,15]) 0 11 12 13 14 15
s2=pd.Series([1,2,3,4,5]) 1 1 2 3 4 5
s3=pd.Series([111,122,133,144,155]) 2 111 122 133 144 155
s4=pd.Series([21,22,23,24,9]) 3 21 22 23 24 9
df=pd.DataFrame([s1,s2,s3,s4])
print(df)

15. Creation of DataFrame from Series (dtype not include) with Custom index
label for rows
CODE OUTPUT
import pandas as pd 0 1 2 3 4
s1=pd.Series([11,12,13,14,15]) a 11 12 13 14 15
s2=pd.Series([1,2,3,4,5]) b 1 2 3 4 5
s3=pd.Series([111,122,133,144,155]) c 111 122 133 144 155
s4=pd.Series([21,22,23,24,9]) d 21 22 23 24 9
df=pd.DataFrame([s1,s2,s3,s4],index=['a','b','c','d'] >>>

6
)
print(df)
16. Creation of DataFrame from Series with Custom index label for columns
CODE OUTPUT
import pandas as pd a b c d e
s1=pd.Series([11,12,13,14,15],index=['a','b','c','d','e'] 0 11 12 13 14 15
) 1 1 2 3 4 5
s2=pd.Series([1,2,3,4,5],index=['a','b','c','d','e']) 2 31 32 33 34 35
s3=pd.Series([31,32,33,34,35],index=['a','b','c','d','e'] 3 21 22 23 24 45
)
s4=pd.Series([21,22,23,24,45],index=['a','b','c','d','e']
)
df=pd.DataFrame([s1,s2,s3,s4])
print(df)

17.Creation of DataFrame from Series with Custom index label for columns
and rows
CODE OUTPUT
import pandas as pd a b c d e
s1=pd.Series([11,12,13,14,15],index=['a','b','c','d','e']) aa 11 12 13 14 15
s2=pd.Series([1,2,3,4,5],index=['a','b','c','d','e']) bb 1 2 3 4 5
s3=pd.Series([41,42,43,44,45],index=['a','b','c','d','e']) cc 41 42 43 44 45
s4=pd.Series([21,22,23,24,45],index=['a','b','c','d','e']) dd 21 22 23 24 45
df=pd.DataFrame([s1,s2,s3,s4],index=['aa','bb','cc','dd']
)
print(df)

18.Creation of DataFrame from Series (includes dtype) with Custom index label
for rows
CODE OUTPUT
import pandas as pd R1 14
L=[14,17,15] R2 17

7
s=pd.Series(L,index=['R1','R2','R3']) R3 15
print(s) dtype: int64

19.Creation of DataFrame from Series (includes dtype) with Custom index label
for column
CODE OUTPUT
import pandas as pd C1
L=[14,17,15] R1 14
s=pd.Series(L,index=['R1','R2','R3']) R2 17
df=pd.DataFrame(s,columns=['C1']) R3 15
print(df)'''

20.Creation of DataFrame from Dictionary of Series


CODE OUTPUT
import pandas as pd key1 key2 key3 key4
s1=pd.Series([11,12,13,14,15]) 0 11 1 111 21
s2=pd.Series([1,2,3,4,5]) 1 12 2 122 22
s3=pd.Series([111,122,133,144,155]) 2 13 3 133 23
s4=pd.Series([21,22,23,24,9]) 3 14 4 144 24
D={'key1':s1,'key2':s2,'key3':s3,'key4':s4 4 15 5 155 9
}
df=pd.DataFrame(D)
print(df)

21.Creation of DataFrame from Dictionary of Dictionary

CODE OUTPUT
import pandas as pd Humanities Medical Non Med
D1={'Name':'Jaya','Marks':87} Name Jaya Abi Kavi
D2={'Name':'Abi','Age':17,'Marks':87} Marks 87 87 76
D3={'Name':'Kavi','Age':18,'Marks':76} Age NaN 17 18

8
DD={"Humanities":D1,"Medical":D2,"No
n Med":D3}
df=pd.DataFrame(DD)
print(df)

Keys of outer dictionary is column labels and inner dictionary is index or row labels.

Select Option in rows and columns

CODE
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],
columns=['Name','Age','Marks','Subject'])
print(df)
OUTPUT
Humanities Medical Non Med
Name Jaya Abi Kavi
Marks 87 87 76
Age NaN 17 18

CODE
#Select options in rows and columns
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}

9
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])
print(df)

OUTPUT
Name Age Marks Subject
R1 jaya 14 98 cs
R2 bala 17 78 bio
R3 krish 15 68 pe
R4 sakthi 15 65 ip
R5 abi 13 87 cs
R6 bharathi 14 98 ip
R7 geetha 13 76 bio
R8 sandhya 12 65 cs

Selecting a Single Column

Method-1: Using Square Bracklet

Syntax: DataFrameObject[ColumnName]

CODE
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])
print(df['Name'])
print(df['Marks'])
OUTPUT
R1 jaya R1 98

10
R2 bala R2 78
R3 krish R3 68
R4 sakthi R4 65
R5 abi R5 87
R6 bharathi R6 98
R7 geetha R7 76
R8 sandhya R8 65
Name: Name, dtype: object Name: Marks, dtype: int64

CODE
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])
print(df['Marks'])
OUTPUT
R1 98
R2 78
R3 68
R4 65
R5 87
R6 98
R7 76
R8 65
Name: Marks, dtype: int64

Method-2: Using Dot Notation

Syntax: DataFrameObject.ColumnName

11
Note: while using Dot Notation, Column Name is to be written without quotes.

CODE
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])
print(df.Name)
OUTPUT
R1 jaya
R2 bala
R3 krish
R4 sakthi
R5 abi
R6 bharathi
R7 geetha
R8 sandhya
Name: Name, dtype: object

Selecting Multiple Columns


Method-1: Using Double square brackets
To select multiple columns, we can give list having multiple columns. Inside the
square brackets with DataFrame Object.
Syntax: DataFRameObject[[col1,col2,col3…]]
CODE
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']

12
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])
print(df[['Name','Age']])
OUTPUT
Name Age
R1 jaya 14
R2 bala 17
R3 krish 15
R4 sakthi 15
R5 abi 13
R6 bharathi 14
R7 geetha 13
R8 sandhya 12

CODE
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])
print(df[['Name','Age']])
OUTPUT
Name Age
R1 jaya 14
R2 bala 17
R3 krish 15
R4 sakthi 15
R5 abi 13

13
R6 bharathi 14
R7 geetha 13
R8 sandhya 12

Selecting Multiple Rows

CODE
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])
print(df.loc['R2':'R4']) #includes stop value
print(df.loc['R2':'R4',:]) #includes stop value

OUTPUT
Name Age Marks Subject
R2 bala 17 78 bio
R3 krish 15 68 pe
R4 sakthi 15 65 ip
Name Age Marks Subject
R2 bala 17 78 bio
R3 krish 15 68 pe
R4 sakthi 15 65 ip

Method-2

Accessing Data using loc

loc is used to select and/ or a combination of rows and columns from the DataFrame.

Syntax:

14
DataFrameObject.loc[StartRow:EndRow,StartColumn:EndColumn:StepValue]

Note:

Using using Dot Notation, Column Name is to be written without quotes.

15
CODE
import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])
print(df.loc['Name':'Marks'])
print(df.loc[:,'Name':'Marks']) #Start,Stop and Step
OUTPUT
Empty DataFrame
Columns: [Name, Age, Marks, Subject]
Index: []

Name Age Marks


R1 jaya 14 98
R2 bala 17 78

16
R3 krish 15 68
R4 sakthi 15 65
R5 abi 13 87
R6 bharathi 14 98
R7 geetha 13 76
R8 sandhya 12 65

Selecting Multiple Rows

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])

print(df.loc['R2':'R4']) #includes stop values

print(df.loc['R2':'R4',:])
OUTPUT
Name Age Marks Subject Name Age Marks Subject
R2 bala 17 78 bio R2 bala 17 78 bio
R3 krish 15 68 pe R3 krish 15 68 pe
R4 sakthi 15 65 ip R4 sakthi 15 65 ip

Selecting Individual Element

CODE

17
import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])

print(df.loc['R5'])
OUTPUT
Name abi
Age 13
Marks 87
Subject cs
Name: R5, dtype: object

Select Using Step Values

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','

18
Age','Marks','Subject'])

print(df.loc[:,'Name':'Subject':2]) #Columns

print("************************************")

print(df.loc['R1':'R7':3]) #rows
OUTPUT
Name Marks
R1 jaya 98
R2 bala 78
R3 krish 68
R4 sakthi 65
R5 abi 87
R6 bharathi 98
R7 geetha 76
R8 sandhya 65
************************************
Name Age Marks Subject
R1 jaya 14 98 cs
R4 sakthi 15 65 ip
R7 geetha 13 76 bio

Keyerror

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

19
D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])

print(df['R2'])#Keyerror

print(df[['R2','R3']])#Keyerror

print(df.loc['Name'])#Keyerror
OUTPUT

Keyerror

Selecting A Single Row

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])

print(df['R2'])#Keyerror

print(df[['R2','R3']])#Keyerror

print(df.loc['Name'])#Keyerror
OUTPUT

20
Name Age Marks Subject

R5 abi 13 87 cs

************************************

Name krish

Age 15

Marks 68

Subject pe
Name: R3, dtype: object

Selecting Rows And Columns At A Time

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])

print(df.loc['R2':'R6','Name':'Marks'])

print("************************************")

print(df.loc['R2':'R6','Name':'Marks':2])#includes step value

21
print("************************************")

print(df.loc['R2':'R6',['Name','Subject']])

print("************************************")

print(df.loc[['R2','R4'],['Name','Subject']])

print("************************************")

print(df.loc['R1':'R7':2,'Name':'Marks':2])

print("************************************")

print(df.loc['R2':'R5'])

print(df.iloc[0:3])#positional ,iloc , Excludes stop value


OUTPUT

Name Age Marks Name Subject

R2 bala 17 78 R2 bala bio

R3 krish 15 68 R3 krish pe

R4 sakthi 15 65 R4 sakthi ip

R5 abi 13 87 R5 abi cs

R6 bharathi 14 98 R6 bharathi ip

************************************ ************************************

Name Marks Name Subject

R2 bala 78 R2 bala bio

R3 krish 68 R4 sakthi ip

R4 sakthi 65 ************************************

22
R5 abi 87 Name Marks

R6 bharathi 98 R1 jaya 98

************************************ R3 krish 68

R5 abi 87

R7 geetha 76

Accessing Elements Using iloc

If we want to extract sunset from DataFrame using the row and column numeric
index/position, then we can use iloc.

Syntax:

Df.iloc[StartRowInex:EndRowIndex:StopValue,StartColumnInex:EndColumnIndex:StopValue
]

iloc works like Slicing operation.

Here, EndRowInex and EndColumnIndex values are not included.

Comparison Between iloc and loc

23
Display Rows at Index

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

24
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])

#Display Rows at index 0 to 8

print(df.iloc[:,0:2])

print("************************************")

# Display Rows at index 0 to 2

print(df.iloc[0:3])

print("************************************")

# Display Rows at index 0 to 3

print(df.iloc[0:4,0:2])

print("************************************")

#Display Rows at index 2 to 5

print(df.iloc[2:6,0:3])

OUTPUT
Name Age Name Age
R1 jaya 14 R1 jaya 14
R2 bala 17 R2 bala 17
R3 krish 15 R3 krish 15
R4 sakthi 15 R4 sakthi 15
R5 abi 13 ************************************
R6 bharathi 14 Name Age Marks
R7 geetha 13 R3 krish 15 68
R8 sandhya 12 R4 sakthi 15 65
************************************ R5 abi 13 87
Name Age Marks Subject R6 bharathi 14 98

25
R1 jaya 14 98 cs
R2 bala 17 78 bio
R3 krish 15 68 pe
************************************

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])

#Display Rows at index1 and 5 and name and marks column

print(df.iloc[[1,5],[0,2]])

print("####################################")

print(df.iloc[[1,5],0:3:2])

print("************************************")

#Display Rows at index 0 and 2

print(df.iloc[0:4:2,0:2])

print("************************************")

#Complete dataframe row and column part

26
print(df.iloc[:])

print("************************************")
OUTPUT
Name Marks Name Age Marks Subject
R2 bala 78 R1 jaya 14 98 cs
R6 bharathi 98 R2 bala 17 78 bio
################################### R3 krish 15 68 pe
# R4 sakthi 15 65 ip
Name Marks R5 abi 13 87 cs
R2 bala 78 R6 bharathi 14 98 ip
R6 bharathi 98 R7 geetha 13 76 bio
************************************ R8 sandhya 12 65 cs
Name Age ************************************
R1 jaya 14
R3 krish 15
************************************

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','

27
Age','Marks','Subject'])

#Display Rows at index 2 to 4 (both inclusive)

print(df.iloc[2:5])

print("************************************")

#From rows at index 2 to 4,display colums Name and Marks

print(df.iloc[2:5,:])

print("************************************")

#Row index from 2 to 4 and column index only 0 and 3

print(df.iloc[2:5,[0,3]])

print("************************************")

#index Error,cant use cutom index label

print(df.iloc[2:5,['Name','Marks']])

print("************************************")

OUTPUT

Name Age Marks Subject Name Subject


R3 krish 15 68 pe R3 krish pe
R4 sakthi 15 65 ip R4 sakthi ip
R5 abi 13 87 cs R5 abi cs
************************************
Name Age Marks Subject InderError
R3 krish 15 68 pe
R4 sakthi 15 65 ip
R5 abi 13 87 cs
************************************

28
Selecting / Accessing Individual Value

(i)Either give a name of row or numeric index in square brackets with column name

<df object>.<column>>[Row name or row numeric index]

CODE

import pandas as pd

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya']

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],columns=['Name','
Age','Marks','Subject'])

print(df.Name[3])

print("************************************")

print(df.Name[3])

print("************************************")

print(df.Marks[2])

print("************************************")

print(df.Marks['R5'])

print("************************************")

print(df.Marks[2])

print("************************************")

29
OUTPUT

sakthi 87
************************************ ************************************
sakthi 68
************************************ ************************************
68
************************************

ii)We can use at or iat attributes with DataFrame object

Using at

It is used to access a single value for row/column label pair

Syntax:

<DF object>.at[Rowlabel,Columnlabel]

CODE OUTPUT

import pandas as pd
N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya'] Bio
A=[14,17,15,15,13,14,13,12]
M=[98,78,68,65,87,98,76,65]
S=['cs','bio','pe','ip','cs','ip','bio','cs']
D={'Name':N,'Age':A,'Marks':M,'Subject':S}
df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],
columns=['Name','Age','Marks','Subject'])
#Display subject of geetha
print(df.at['R7','Subject'])

30
Using iat : It is used to access a single value for row/column label pair by integer
position

Syntax:<DF object>.iat[Rowindex,Columnindex]

CODE OUTPUT

import pandas as pd
65

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya'] cs

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],

columns=['Name','Age','Marks','Subject'])

print(df.iat[3,2])

print(df.iat[4,3])

Individual Elements in a List

CODE OUTPUT

import pandas as pd
98

N=['jaya','bala','krish','sakthi','abi','bharathi','geetha','sandhya'] 13

A=[14,17,15,15,13,14,13,12]

M=[98,78,68,65,87,98,76,65]

S=['cs','bio','pe','ip','cs','ip','bio','cs']

31
D={'Name':N,'Age':A,'Marks':M,'Subject':S}

df=pd.DataFrame(D,index=['R1','R2','R3','R4','R5','R6','R7','R8'],

columns=['Name','Age','Marks','Subject'])

#Display mark of bharathi

print(df.Marks['R6'])

#Display age of abi

print(df.Age[4])

Attributes of DataFrame

When we create a DataFrame object, all information related to it (such as its size, its
datatype, its dimensions etc.) is available through its attributes.

Syntax:

<Data/frameObject>.AttributeName

Attributes Description Example

index returns the index import pandas as pd


labels of the
data = {
DataFrame

'Student Name': ['Ravi', 'Priya', 'Rahul'],


columns returns the column
labels of the 'Age': [21, 20, 22],
DataFrame
'City': ['Mumbai', 'Delhi', 'Bangalore']
size return the total
number of elements }

32
in the DataFrame df = pd.DataFrame(data)

shape return a tuple print("Row labels (index):", df.index)


representing the
#Row labels (index): RangeIndex(start=0,
dimensionality of the
stop=3, step=1)
DataFrame

print("Column labels:", df.columns)


empty Checks if the
DataFrame is empty #Column labels: Index(['Student Name',

'Age', 'City'], dtype='object')


ndim returns the number
of dimensions of the print(df.size)
DataFrame
#Output: 9 (3*3)
T Used to transpose
the DataFrame print(df.shape)

(switching rows and


# Output: (3,3) (3 rows, 3 columns)
columns)
print(df.empty)

# Output: False (since df has data)

print(df.ndim)

# Output: 2

print(df.T)

#Output

0 1 2

Student Name Ravi Priya Rahul

Age 21 20 22

City Mumbai Delhi Bangalore

Methods in DataFrame

33
head() function

 Returns the first n rows of the DataFrame.


 If the value for n is not passed, then by default n takes 5 and the first five rows
are displayed.

CODE
import pandas as pd
data = {
'Name': ['A', 'B', 'C','D','E','F','G'],
'Age': [25, 30, 35,24,35,22,34],
'City': ['Delhi', 'Goa', 'Mumbai','AP','MP','TN','Goa']
}
df = pd.DataFrame(data,
index=['Stud1','Stud2','Stud3','Stud4','Stud5','Stud6','Stud7'])
print(df.head(3))
print(df.head())
print(df.head(-1))
OUTPUT
Name Age City Name Age City Name Age City
Stud1 A 25 Delhi Stud1 A 25 Delhi Stud1 A 25 Delhi
Stud2 B 30 Goa Stud2 B 30 Goa Stud2 B 30 Goa
Stud3 C 35 Mumbai Stud3 C 35 Mumbai Stud3 C 35 Mumbai
Stud4 D 24 AP Stud4 D 24 AP
Stud5 E 35 MP

tail() function

 Returns the last n rows of the DataFrame.

If the value for n is not passed, then by default n takes 5 and the last five rows are
displayed.

CODE
import pandas as pd
data = {

34
'Name': ['A', 'B', 'C','D','E','F','G'],
'Age': [25, 30, 35,24,35,22,34],
'City': ['Delhi', 'Goa', 'Mumbai','AP','MP','TN','Goa']
}
df = pd.DataFrame(data,
index=['Stud1','Stud2','Stud3','Stud4','Stud5','Stud6','Stud7'])
print(df.tail(1))
print(df.tail())
print(df.tail(-3))
OUTPUT
Name Age City Stud7 G 34 Goa
Stud7 G 34 Goa Name Age City
Name Age City Stud4 D 24 AP
Stud3 C 35 Mumbai Stud5 E 35 MP
Stud4 D 24 AP Stud6 F 22 TN
Stud5 E 35 MP Stud7 G 34 Goa
Stud6 F 22 TN

Note: If you pass a negative integer n to head(), it will return all rows except the last
n rows and if you pass a negative integer n to tail(), it will return all rows except the
first n rows.

Accessing Elements through Indexing

Two Types of Indexing

Label Indexing

 In label indexing,we can access the elements of the DataFrame with the help
of either Row or Column Labels.
 There are various methods to access the elements of DataFRame using
Labels.
 loc and at are the two popular techniques for Label Based Indexing.

Code to display details of students who scored more than 80 marks

35
import pandas as pd

data = {

'Name': ['A', 'B', 'C','D','E','F','G'],

'Age': [25, 30, 35,24,35,22,34],

'City': ['Delhi', 'Goa', 'Mumbai','AP','MP','TN','Goa']

df = pd.DataFrame(data,
index=['Stud1','Stud2','Stud3','Stud4','Stud5','Stud6','Stud7'])

print(df)

print(df.Name) or print(df[‘Name’]) or print(df.loc[:,'Name']) #Displaying Row

OUTPUT

Name Age City Stud1 A

Stud1 A 25 Delhi Stud2 B

Stud2 B 30 Goa Stud3 C

Stud3 C 35 Mumbai Stud4 D

Stud4 D 24 AP Stud5 E

Stud5 E 35 MP Stud6 F

Stud6 F 22 TN Stud7 G
Name: Name, dtype: object
Stud7 G 34 Goa

#Displaying Column Stud1 25

36
print(df['Age']) Stud2 30

print(df.Age) Stud3 35

print(df.loc[:,'Age']) Stud4 24

Stud5 35

Stud6 22

Stud7 34

Name: Age, dtype: int64

print(df.loc[['Stud2','Stud4']]) Name Age City

Stud2 B 30 Goa

Stud4 D 24 AP

Boolean Indexing

 Boolean indexing in pandas DataFrames allows you to filter data based on


specific conditions.
 It involves creating a boolean Series (a Series of True/False values) and
using it to select rows that meet the condition.

Code to display details of students who scored more than 80 marks

import pandas as pd

data = {

'Name': ['Ravi', 'Priya', 'Rahul', 'Sneha', 'Amit'],

37
'Marks': [75, 82, 68, 91, 80]

# Create DataFrame

df = pd.DataFrame(data)

print("Names with more than 80 marks:")

print(df[df['Marks'] > 80])

print(df['Marks'] > 80)


OUTPUT
0 False
Names with more than 80 marks:
1 True

Name Marks 2 False


3 True
1 Priya 82 4 False
3 Sneha 91 Name: Marks, dtype: bool

Names with more than 80 marks:

Method-2

Code

import pandas as pd

#Create a dictionary

dict = {'name':["Rachel", "Monica", "Joey", "Phoebe"],

'job': ["Doctor", "Chef", "Actor", "Singer"],

'Age':[28, 28, 30, 31]}

38
#Create a dataframe with boolean values

df = pd.DataFrame(dict, index = [False, True, True, False])

print(df)

print(df.loc[True])
OUTPUT

name job Age

False Rachel Doctor 28 name job Age

True Monica Chef 28 True Monica Chef 28

True Joey Actor 30 True Joey Actor 30

False Phoebe Singer 31

39

You might also like