Revision Point - Dataframe
Revision Point - Dataframe
Revision Point - Dataframe
Example 3
We can also use the index=[list_of_row_labels]
and columns=[list_of_column_labels] to specify
the row index as well as the column index
Example 3, dataframe from a list of dictionaries
with row index & column index
C - List of Dictionary import pandas
Recall that dictionary is of the form { key1 : L = [ {'roll' : 101 , 'name' : 'Astha' } ,\
{'roll' : 104 , 'name' : 'Gautam', 'mark' : 478} ]
value1 , key2 : value2 , - - - }
DF = pandas.DataFrame( L , index= ['s1' , 's2'] ,
The keys of the dictionary become the column
columns =['roll' , 'name'] )
names in the DataFrame object and the values
#note , here column 'mark' is skipped
of the dictionary become the column-values of
print("First DataFrame" )
the DataFrame object
print(DF)
Example 1
DF2 = pandas.DataFrame(L , index= ['s1','s2'] ,
import pandas
d1 = { 'roll' : 101 , 'name' : 'Astha' , 'tot_mark' : 456 } columns=['roll' , 'name' , 'age'] )
d2 = {'roll' : 104 , 'name' : 'Gautam', 'tot_mark' : 478 } #Here, column 'age' is additonal column, which
d3 = {'roll' : 105 , 'name' : 'Deepika', 'tot_mark' : 453 , does not exist in List of Dictionary
'grade' : 'A2' } print("Second DataFrame is")
L = [ d1 , d2, d3 ] print(DF2)
df_list = pandas.DataFrame(L) OUTPUT
print("Data Frame from list of dictionaries ")
print(df_list)
OUTPUT
Example 5
To read CSV file without header
# header = to omit(None) the display of
headings of columns
DH = pp.read_csv("stu_result.csv", header =
The read_csv() method has many parameters to None )
control the kind of data imported to create the print("The DataFrame is\n", DH)
DataFrame. OUTPUT
Example 2
To show the shape ( number of rows and
columns) of CSV file imported in a DataFrame
r ,c = sdf.shape
print("\nTotal rows", r, "Total columns", c)
OUTPUT
Page
12 |K V3
S REGIONAL OFFICE, JAIPUR |SUBJECT-INFORMATICS PRACTICES (TERM -I SESSION
2021-22)
OUTPUT
Display rows using loc method:-
Syntax-
<DataFrame
object>.loc[<startrow>:<endrow>,<startcolum
n>:<endcolumn>]
Examples:
print(df.loc[1]) # display data of particular
Here, Adm_No will be the first column instead of single row (row 1)
indices. Output:
Example 7 a 10
b 20
To read CSV file with new column names
c 30
#to use different names of column from default d 40
data, use skiprows along-with names e 50
DF = pp.read_csv("stu_result.csv", skiprows =1 , Name: 1, dtype: int64
names = ['StuNo' , 'SName', 'SClass','T_Marks'] )
print('DataFrame\n', DF) print(df.loc[0:1]) #display data of
OUTPUT multiple rows by using slicing(rows 0 and 1)
Output:
a b c d e
0 1 2 3 4 5
1 10 20 30 40 50
# display data of
multiple rows with single column by using
Display/Iteration of DataFrame:- slicing
import pandas as pd Output: (rows 0,1 and column a)
L1=[1,2,3,4,5] 0 1
L2=[10,20,30,40,50] 1 10
df=pd.DataFrame ([L1,L2],columns=[ Name: a, dtype: int64
'a','b','c','d','e']) # display
print(df) # display entire DataFrame data of multiple rows with multiple columns
Output: using slicing method(rows 0,1 and columns
a b c d e a,b,c)
0 1 2 3 4 5 Output:
1 10 20 30 40 50 a b c
Display columns 0 1 2 3
print(df['a']) # display data of particular 1 10 20 30
column (column a) Display rows using iloc method:-
Output: This method is used when DataFrame object
0 1 does not have row and column labels or even
1 10 we may not remember them. It works on
Name: a, dtype: int64 numeric index.
Syntax:-
print(df[['a','c','e']]) # display data of <DataFrame
multiple columns (columns a,c and e) object>.iloc[<startrowindex>:<endrowindex>,<
Output: startcolumnindex>:<endcolumnindex>]
a c e
0 1 3 5
1 10 30 50
Page
13 | K V4S R E G I O N A L O F F I C E , J A I P U R | S U B J E C T - I N F O R M A T I C S P R A C T I C E S ( T E R M - I S E S S I O N
2021-22)
Examples: Adding a New Column to a DataFrame: To
print(df.iloc[0:2,1:3]) # display rows exist add a new column to a DataFrameResultDFwe
on index 0,1 and columns exist on index 1,2 can write the following statement:
Output: >>>ResultDF['Radha']=[89,78,76]
b c Or
0 2 3 ResultDF.loc[:,'Radha']=[89,78,76]
1 20 30 Or
print(df.iloc[0:2,:]) # display rows exist on ResultDF.at[:,'Radha']=[89,78,76]
index 0,1 with all columns >>>print(ResultDF)
Output: or
a b c d e Output:-
0 1 2 3 4 5
Arnab RamitSamridhi Riya
2 10 20 30 40 50
Mallika Radha
Difference between loc and iloc method:-
Maths 90 92 89 81 94 89
In loc method both start label and end label
Science 91 81 91 71 95 78
are included but in iloc method end index is
excluded when given as strat:end. Hindi 97 96 88 67 99 76
Operations on rows and columns in
Note: Assigning values to a new column label
DataFrames:-We can perform some basic
that does not exist will create a new column
operations on rows and columns of a DataFrame
at the end If already exists then the
like selection, deletion, addition, and renaming
assignment statement will update the values
import pandas as pd of the already existing column
dict={ 'Arnab': pd.Series([90, 91, 97], Example :
index=['Maths','Science','Hindi']), ResultDF['Ramit']=[99, 98, 78]
>>>print(ResultDF)
'Ramit': pd.Series([92, 81, 96], Output:
index=['Maths','Science','Hindi']), Arnab RamitSamridhi Riya Mallika Radha
Maths 90 99 89 81 94 89
'Samridhi': pd.Series([89, 91, 88],
Science 91 98 91 71 95 78
index=['Maths','Science','Hindi']),
Hindi 97 78 88 67 99 76
'Riya': pd.Series([81, 71, 67], Adding a New Row to a DataFrame: To add a
index=['Maths','Science','Hindi']), new row to a DataFrame we can use the
DataFrame.loc[ ] method.
'Mallika': pd.Series([94, 95, 99], Suppose we want to add English marks in
index=['Maths','Science','Hindi']) } above DataFrame, we can write the following
statement:
ResultDF = pd.DataFrame(dict)
ResultDF.loc['English'] = [85, 86, 83, 80, 90, 89]
print(ResultDF)
>>>print(ResultDF)
Output: Or
ResultDF.at['English'] = [85, 86, 83, 80, 90, 89]
Arnab RamitSamridhi Riya Mallika >>>print(ResultDF)
Maths 90 92 89 81 94 Output:
Arnab RamitSamridhi Riya Mallika
Science 91 81 91 71 95 Radha
Maths 90 99 89 81 94 89
Hindi 97 96 88 67 99
Science 91 98 91 71 95 78
>>>
Page
14 | K V5S R E G I O N A L O F F I C E , J A I P U R | S U B J E C T - I N F O R M A T I C S P R A C T I C E S ( T E R M - I S E S S I O N
2021-22)
Hindi 97 78 88 67 99 76 Output:-
English 85 86 83 80 90 89 Delhi 10927986
DataFrame.loc[] method can also be used to Mumbai 12691836
change the data values of a row to a particular
Kolkata 4631392
value.
Selecting / Accessing multiple columns: Just
Example: to set marks in 'Maths' for all
use the following syntax
columns to 0:
>>>ResultDF.loc['Maths']=0 <DF_object>[[<column_name1>,<column_name
>>>print(ResultDF) 2>,<column_name3>......]]
Output:
Arnab RamitSamridhi Riya Mallika
Radha
Maths 0 0 0 0 0 0 Output:- Population Hospital
Science 91 98 91 71 95 78 Delhi 10927986 189
Hindi 97 78 88 67 99 76
English 85 86 83 80 90 89 Mumbai 12691836 208
>>>ResultDF[: ] = 0 # Set all values in
ResultDF to 0 Kolkata 4631392 149
>>>ResultDF
Selecting /Accessing a subset from a
Arnab Ramit Samridhi Riya DataFrame using Row / Column Names: Use
Mallika Radha the following syntax :-
<DF_object>.loc[<start_row>:<end_row>,<start
Maths 0 0 0 0 0 0
0 _column>:<end_column>]
Science 0 0 0 0 0 0 or
0
<DF_object>.iloc[<start_row_index>:<end_row_
Hindi 0 0 0 0 0 index>,<start_column_index>:<end_column_ind
0 0
ex>]
English 0 0 0 0 0
0 0
Page
15 | K V6S R E G I O N A L O F F I C E , J A I P U R | S U B J E C T - I N F O R M A T I C S P R A C T I C E S ( T E R M - I S E S S I O N
2021-22)
for deleting a column set axis=1. Consider the Output: Arnab Mallika Radha
following DataFrame:
Sub1 90 94 89
Arnab RamitSamridhi Riya Mallika
Sub2 97 99 76
Radha
English 85 90 89
Maths 90 99 89 81 94 89
Note: The parameter axis='index' is used to
Science 91 98 91 71 95 78
specify that the row label is to be
Hindi 97 78 88 67 99 76 changed and axis='columns' to specify
that the column label is to be changed
English 85 86 83 80 90 89
Renaming Column Labels of a DataFrame:
To delete the row with label 'Science' we can
write the following statement: ResultDF=ResultDF.rename({'Arnab':'Student1
','Mallika':'Student2','Radha':'Student3'},
>>>ResultDF = ResultDF.drop('Science',
axis=0)
>>>print(ResultDF)
>>>ResultDF
Output: Student1 Student2 Student3
Output : Arnab RamitSamridhi Riya Mallika Radha
Sub1 90 94 89
Maths 90 99 89 81 94 89
Hindi 97 78 88 67 99 76 Sub2 97 99 76
English 85 86 83 80 90 89 English 85 90 89
To delete the columns having labels 'Samridhi', >>>
'Ramit' and 'Riya': we can write the following
Operations on rows and columns in
statement:-
DataFrames:-We can perform some basic
>>>ResultDF = operations on rows and columns of a
ResultDF.drop(['Samridhi','Ramit','Riya'], DataFrame like selection, deletion, addition,
axis=1) and renaming
>>>ResultDF import pandas as pd
Output:Arnab Mallika Radha dict={ 'Arnab': pd.Series([90, 91, 97],
index=['Maths','Science','Hindi']),
Maths 90 94 89
'Ramit': pd.Series([92, 81, 96],
Hindi 97 99 76
index=['Maths','Science','Hindi']),
English 85 90 89
'Samridhi': pd.Series([89, 91, 88],
Renaming Row Labels of a DataFrame: index=['Maths','Science','Hindi']),
DataFrame.rename() method is used to rename
'Riya': pd.Series([81, 71, 67],
the row and column label. To rename the row
index=['Maths','Science','Hindi']),
indices Maths to sub1, Hindi to sub2 in above
DataFrame we can write the following 'Mallika': pd.Series([94, 95, 99],
statement:- index=['Maths','Science','Hindi']) }
ResultDF=ResultDF.rename({'Maths':'Sub1', ResultDF = pd.DataFrame(dict)
print(ResultDF)
Print(ResultDF)
Page 7
16 | K V S R E G I O N A L O F F I C E , J A I P U R | S U B J E C T - I N F O R M A T I C S P R A C T I C E S ( T E R M - I S E S S I O N
2021-22)
Output: Adding a New Row to a DataFrame: To add a
new row to a DataFramewe can use the
Arnab RamitSamridhi Riya Mallika
DataFrame.loc[ ] method.
Maths 90 92 89 81 94
Suppose we want to add English marks in
Science 91 81 91 71 95 above DataFrame, we can write the following
statement:
Hindi 97 96 88 67 99
ResultDF.loc['English'] = [85, 86, 83, 80, 90, 89]
>>>
>>>print(ResultDF)
Adding a New Column to a DataFrame: To
add a new column to a DataFrameResultDFwe Or
can write the following statement:
ResultDF.at['English'] = [85, 86, 83, 80, 90, 89]
>>>ResultDF['Radha']=[89,78,76]
>>>print(ResultDF)
Or
Output:
ResultDF.loc[:,'Radha']=[89,78,76]
Arnab RamitSamridhi Riya Mallika
Or Radha
ResultDF.at[:,'Radha']=[89,78,76] Maths 90 99 89 81 94 89
>>>print(ResultDF) Science 91 98 91 71 95 78
or Hindi 97 78 88 67 99 76
Output:- English 85 86 83 80 90 89
Arnab RamitSamridhi Riya Mallika Radha DataFrame.loc[] method can also be used to
Maths 90 92 89 81 94 89 change the data values of a row to a particular
value.
Science 91 81 91 71 95 78
>>>print(ResultDF) Science 91 98 91 71 95 78
Hindi 97 78 88 67 99 76
Output:
English 85 86 83 80 90 89
Arnab Ramit Samridhi Riya Mallika Radha
>>>ResultDF[: ] = 0 # Set all values in
Maths 90 99 89 81 94 89
ResultDF to 0
Science 91 98 91 71 95 78
>>>ResultDF
Hindi 97 78 88 67 99 76
Arnab Ramit Samridhi Riya
Mallika Radha
Page
17 | K V S8 R E G I O N A L
2021-22)
OFFICE, JAIPUR |SUBJECT-INFORMATICS PRACTICES (TERM -I SESSION
Maths 0 0 0 0 0 0 0 or
Science 0 0 0 0 0 0 0 <DF_object>.iloc[<start_row_index>:<end_row_
Hindi 0 0 0 0 0 0
0
index>,<start_column_index>:<end_column_ind
ex>]
English 0 0 0 0 0 0 0
Output:- Population Hospital To delete the row with label 'Science' we can
write the following statement:
Delhi 10927986 189
>>>ResultDF = ResultDF.drop('Science',
Mumbai 12691836 208 axis=0)
>>>ResultDF
Kolkata 4631392 149
Output : Arnab RamitSamridhi Riya Mallika
Radha
Selecting /Accessing a subset from a
DataFrame using Row / Column Names: Use Maths 90 99 89 81 94
89
the following syntax :-
<DF_object>.loc[<start_row>:<end_row>,<start
_column>:<end_column>]
Page 9
18 | K V S R E G I O N A L O F F I C E , J A I P U R | S U B J E C T - I N F O R M A T I C S P R A C T I C E S ( T E R M - I S E S S I O N
2021-22)
Hindi 97 78 88 67 99 Print(ResultDF)
76
Output: Arnab Mallika Radha
English 85 86 83 80 90
Sub1 90 94 89
89
Sub2 97 99 76
To delete the columns having labels 'Samridhi',
'Ramit' and 'Riya': we can write the following English 85 90 89
statement:-
>>>ResultDF = Note: The parameter axis='index' is used to
ResultDF.drop(['Samridhi','Ramit','Riya'], specify that the row label is to be
axis=1) changed and axis='columns' to specify
>>>ResultDF that the column label is to be changed
Output:Arnab Mallika Radha
Maths 90 94 89 Renaming Column Labels of a DataFrame:
Hindi 97 99 76 ResultDF=ResultDF.rename({'Arnab':'Student1
English 85 90 89 ','Mallika':'Student2','Radha':'Student3'},
OUTPUT
Hindi English IP
Aditya 34 23 67
Mohit 45 21 32
Page
19 | K V10
S REGIONAL OFFICE, JAIPUR |SUBJECT-INFORMATICS PRACTICES (TERM -I SESSION
2021-22)
False, True, True, False, so
output
Hindi English IP
Aditya 34 23 67
Rajesh 60 80 91
We can include specific column(s) in our output in two ways
To display only IP column in place of all columns we can modify above code as given below
df1
OR
Output
Aditya 67
Rajesh 91
Name:IP, dtype: int64
If Hindi and IP marks to be displayed for the same problem stated above the code will be
df1
OR
output
Hindi IP
Aditya 34 67
Rajesh 60 91
Page 11
20 | K V S R E G I O N A L
2021-22)
OFFICE, JAIPUR |SUBJECT-INFORMATICS PRACTICES (TERM -I SESSION