0% found this document useful (0 votes)

7 views19 pages

Chapter 2 Python Pandas - II

Chapter 2 of the document focuses on advanced operations with Python Pandas, particularly dataframes, including iterating over rows and columns, performing binary operations, and descriptive statistics. It covers functions such as iterrows(), iteritems(), and various statistical functions like mode(), mean(), and median(). Additionally, it explains methods for combining dataframes using concat(), join(), and merge(), along with examples and homework questions for practice.

Uploaded by

mainshabhatnagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Chapter 2 Python Pandas - II

Uploaded by

mainshabhatnagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Chapter-2

Python Pandas – II
Introduction – In this chapter we shall talk more about dataframes, basic
operations of dataframe, descriptive statistics, pivoting, handling missing data,
combining/merging etc.
Iterating Over a DataFrame
iterrows() – Iterates over dataframe row-wise where each horizontal subset
is in the form of (row index, Series) where Series contains all column values for
that row-index.
Example 1 : Write a program to print the DataFrame df. One row at a time.
Import pandas as pd
dict = {'Name':['Ram','Mohan','Sachin'],
'Marks':[95,88,89]}
df = pd.DataFrame(dict, index = ['Rno 1','Rno 2','Rno 3'])
print(df)

Ans :

iteritems() function – This function return vertical subset form a dataframe

in the form of column index and a series object containing values for all rows
in that column.
Example 2 : Write a program to print the DataFrame df. One column at a time.

Ans :

Print Specific Column from a Row :

Syntax :
for r, row in df.iterrows():
Row[<Column name>]
Example 3 : Write a program to print only the value from marks column, for each
row.
Binary Operations in a DataFrame :
Binary operations means operations requiring two values to perform and these
values are picked elementwise.
• For matching row, column index the given operation is performed.
• For nonmatching row, column indexes NaN value is stored in the result.

Consider the following dataframes (df1, df2, df3, df4) :

Addition Binary operation using + , add() and radd() :

Note – 1. NaN result for non-matching row or column.

2. If a column contain NaN value then datatype of that column will be changed in
float even values of that column in the form of integer.
Subtraction Binary operation using - , sub() and rsub() :

Multiplication Binary operation using * and mul() :

Division Binary operation using * and div() :

Descriptive Statistics with Pandas :
Pandas also include many useful statistical functions :-

Reference dataframe is given below :-

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
dict = {'Fruits':[7830.0,11950.0,113.1,7152.0,44.1,24169.2],
'Pulses':[931.0,818.0,1.7,33.0,23.2,2184.4],
'Rice':[7452.4,1930.0,2604.8,11856.2,814.6,13754.0],
'Wheat':[np.nan,2737.0,np.nan,16440.5,0.5,30056.0] }
prodf = pd.DataFrame(dict, index = ['Andhra P.','Gujrat','kerala','Punjab',
'Tripura', 'Uttar P.'])
print(prodf)

Functions min() and max() –

Finding minimum and maximum value column-wise:

Finding minimum and maximum value row-wise

Consider following DataFrame dfmks

dict = {'A' :[99,90,95,94,97],

'B':[94.0,94.0,89.0,np.NaN,100.0],
'C':[92,92,91,99,99],
'D':[97.0,97.0,89.0,95.0,np.NaN]}
dfmks = pd.DataFrame(dict, index = ['Acct','Eco','Eng','IP','Math'])
print(dfmks)

Example : Consider the DataFrame (dfmks) given above. Write a program to print
the maximum marks scored in each subject across all sections.
Example : Consider the DataFrame (dfmks) given above. Write a program to print
the maximum marks scored in a section, across all subject.

Index of Maximum and Minimum Values : idxmax() and idxmin() :-

idxmax() - To find maximum index.
idxmin() – To find minimum index.

Function mode(), mean() and median() –

Function mode() , mean() and median() are common statistics functions.

mode() – returns the mode value (i.e. the value that appears most often in a
given set of values).

mean() – returns the computer mean (average) from a set of values.

median() – returns the middle number from a set of numbers. It returns the
median value that separates the higher half from the lower half of a set of values.

Consider following dataframe :

now calculate mode , median and mean.

(1) Calculating mode - mode() :

(2) Calculating median - median() :

(3) Calculating mean – mean() –

Consider dataframe mksdf

lst1 = [99,94,95,94,97]

lst2 = [94,94,89,87,100]

lst3 = [92,92,91,99,99]

lst4 = [99,97,89,94,99]

dict = {'A':lst1,'B':lst2,'C':lst3,'D':lst4}

mksdf = pd.DataFrame(dict)

mksdf.index = ['Acct','Eco','Eng','IP','Math']

print(mksdf)

Homework –

1. Name the function to iterate over a DataFrame horizontally.

2. Name the function to iterate over a DataFrame vertically.

3. Is the result of sub() and rsub() the same? Why/why not ?

4. What are Binary operations ? Name the function that let you perform
binary operation on a DataFrame.

5. Write a program to print a DataFrame one column at a time and print

only first three columns.
Combining DataFrames :

Combining DataFrames using concat() : This method is useful if the

two dataframes have similar structures.

df1 = pd.DataFrame({'Sub_id':[1,2,3,4,5],

'Fname':['amit','ajay','vikas','vaibhav','jia'],

'Lname':['shukla','tiwari','madan','parmar','jain']})

df2 = pd.DataFrame({'Sub_id':[4,5,6,7,8],

'Fname':['anil','rita','akshay','kapil','ms'],

'Lname':['gupta','sharma','sinha','dev','dhoni']})

df3 = pd.DataFrame({'Sub-id':[1,2,3,4,5,7,8,9,10,11],

'Test-id':[51,15,15,61,16,14,15,1,61,16]})
By default concat() concatenate along the row. To concatenate along the
column we can give argument axis = 1.

Combining DataFrame using join() : Basically create a dataframe from

two dataframes by joining their rows.

1. Inner Join – Take the rows having common indexes from both the
dataframes.

df1 = pd.DataFrame(

{'Name':['Khushboo','Prarthana','Aman','Kamal']},

index = [1,2,3,4])

df2 = pd.DataFrame({'Marks':[19,18,12]}, index = [1,3,7])

2. Left join – Take all the rows from the left (first) dataframe and join with
it only those rows from the second dataframe that have common indexes
as dataframe 1. It is default join.

3. Right join – Take all the rows from the right (second) dataframe and
join with it only those rows from the first dataframe that have common
indexes as dataframe 2.

4. Outer join – Take all rows from both the dataframe and join them.
Joining on a Column : we can provide column name of dataframe 1 with
on argument of join method.

Df1.join(Df2, on = <column name of Df1>)

When we specify the on argument – the left dataframe’s mentioned

column’s values will be matched with indexes of the second
dataframe.

Consider following dataframe named Df1 and Df2 :

Df1 = pd.DataFrame({'Cust_id':[1,2,3,4,5,6],

'Product':['Oven','AC','AC','Speaker','Tablet','Smartphone']})

Df2 = pd.DataFrame({'P_id':[2,4,6],

'State':['Delhi','Goa','kerala']

})
Now, change the column name from P_id to Cust_id :

Now join :

>>> Df2.join(Df1, on = 'Cust_id')

ValueError: columns overlap but no suffix specified: Index(['Cust_id'],

dtype='object')
Combining Dataframe using merge() : To combine two dataframes such
that two rows with some common values are merged together in the final
result. We can specify the field on the basis of which we want to combine
the two datafames.

Where :

➢ The on argument takes the column name found in both the

dataframes.
➢ Default is ‘inner’ join for merge().
➢ If we skip the argument on = <field_name>, then it will take any
merge on common fields.

M-Audio Enigma Manual
100% (3)
M-Audio Enigma Manual
25 pages
Health Insurance Management System
57% (7)
Health Insurance Management System
9 pages
ITEC 1010 Final Exam Review
No ratings yet
ITEC 1010 Final Exam Review
6 pages
DataFrames Continued
No ratings yet
DataFrames Continued
9 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Python Pandas - 2 2020-21
No ratings yet
Python Pandas - 2 2020-21
21 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
learnPandas
No ratings yet
learnPandas
37 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Pandas
No ratings yet
Pandas
13 pages
DSP Unit-5 Updated
No ratings yet
DSP Unit-5 Updated
23 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas Moderate
No ratings yet
Pandas Moderate
15 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Pandas
No ratings yet
Pandas
44 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Exp 6
No ratings yet
Exp 6
9 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas
No ratings yet
Pandas
94 pages
Unit3 - 3) Pandas - Ipynb - Colab
No ratings yet
Unit3 - 3) Pandas - Ipynb - Colab
11 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Exp 3
No ratings yet
Exp 3
10 pages
Pandas
No ratings yet
Pandas
26 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Python Programming For Data Science
No ratings yet
Python Programming For Data Science
36 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
Unit IV
No ratings yet
Unit IV
49 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Lab 9
No ratings yet
Lab 9
9 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Lecture - 2 Pandas
No ratings yet
Lecture - 2 Pandas
24 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
DF Ques1
No ratings yet
DF Ques1
2 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Pandas Cheat Sheet - Python For Data Science
No ratings yet
Pandas Cheat Sheet - Python For Data Science
5 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
85XX+ User Manual PDF
No ratings yet
85XX+ User Manual PDF
109 pages
Knowledge Base Improvement
No ratings yet
Knowledge Base Improvement
9 pages
Erosa Knowles PDF
No ratings yet
Erosa Knowles PDF
2 pages
Backup Recovery Policy
No ratings yet
Backup Recovery Policy
4 pages
SAP MM Pricing Procedure
No ratings yet
SAP MM Pricing Procedure
12 pages
Kertas Penerangan CO1
No ratings yet
Kertas Penerangan CO1
52 pages
Systems Validation - Application To Statistical programs-BMS-MedResMeth-2005-1471-2288-5-3
No ratings yet
Systems Validation - Application To Statistical programs-BMS-MedResMeth-2005-1471-2288-5-3
4 pages
Lab 5 Dig and Nslookup
No ratings yet
Lab 5 Dig and Nslookup
4 pages
Business Letter Format
No ratings yet
Business Letter Format
9 pages
Oracle Apps Cloning-Procedure
No ratings yet
Oracle Apps Cloning-Procedure
39 pages
1.3-Comments, Identifiers and Keywords
No ratings yet
1.3-Comments, Identifiers and Keywords
6 pages
Carrier Controls 38ap-1t
100% (1)
Carrier Controls 38ap-1t
92 pages
DATACON Users Guide
No ratings yet
DATACON Users Guide
206 pages
M.E.VLSI Design and Embedded Systems
No ratings yet
M.E.VLSI Design and Embedded Systems
58 pages
Formal Methods in Software Engineering: Spring 2019
No ratings yet
Formal Methods in Software Engineering: Spring 2019
36 pages
Lesson 1 Computer Laboratory Guidelines
No ratings yet
Lesson 1 Computer Laboratory Guidelines
46 pages
Low 1997
No ratings yet
Low 1997
22 pages
Report Abap Alv
No ratings yet
Report Abap Alv
10 pages
Milestone Project For Express IT Course
No ratings yet
Milestone Project For Express IT Course
24 pages
TM 100 Datasheet Working 1
No ratings yet
TM 100 Datasheet Working 1
2 pages
E-Mail Keyboard Shortcuts: Shortcut Key Action
No ratings yet
E-Mail Keyboard Shortcuts: Shortcut Key Action
3 pages
Torch
No ratings yet
Torch
4 pages
ID Processing System For ID and Printing Office of Jose Maria College Foundation, Inc.
No ratings yet
ID Processing System For ID and Printing Office of Jose Maria College Foundation, Inc.
26 pages
PPU-2 Data Sheet 4921240313 UK
No ratings yet
PPU-2 Data Sheet 4921240313 UK
10 pages
Cambridge IGCSE
No ratings yet
Cambridge IGCSE
20 pages
WINSEM2023-24 BECE320E ETH VL2023240504751 2024-03-11 Reference-Material-I
No ratings yet
WINSEM2023-24 BECE320E ETH VL2023240504751 2024-03-11 Reference-Material-I
79 pages
07 in Database Archiving PDF
No ratings yet
07 in Database Archiving PDF
4 pages

Chapter 2 Python Pandas - II

Uploaded by

Chapter 2 Python Pandas - II

Uploaded by

Chapter-2

iteritems() function – This function return vertical subset form a dataframe

Print Specific Column from a Row :

Consider the following dataframes (df1, df2, df3, df4) :

Note – 1. NaN result for non-matching row or column.

Multiplication Binary operation using * and mul() :

Division Binary operation using * and div() :

Reference dataframe is given below :-

Functions min() and max() –

Finding minimum and maximum value row-wise

Consider following DataFrame dfmks

dict = {'A' :[99,90,95,94,97],

Index of Maximum and Minimum Values : idxmax() and idxmin() :-

Function mode(), mean() and median() –

Function mode() , mean() and median() are common statistics functions.

mean() – returns the computer mean (average) from a set of values.

Consider following dataframe :

now calculate mode , median and mean.

(1) Calculating mode - mode() :

(2) Calculating median - median() :

Consider dataframe mksdf

1. Name the function to iterate over a DataFrame horizontally.

2. Name the function to iterate over a DataFrame vertically.

3. Is the result of sub() and rsub() the same? Why/why not ?

5. Write a program to print a DataFrame one column at a time and print

Combining DataFrames using concat() : This method is useful if the

Combining DataFrame using join() : Basically create a dataframe from

df2 = pd.DataFrame({'Marks':[19,18,12]}, index = [1,3,7])

Df1.join(Df2, on = <column name of Df1>)

When we specify the on argument – the left dataframe’s mentioned

Consider following dataframe named Df1 and Df2 :

>>> Df2.join(Df1, on = 'Cust_id')

ValueError: columns overlap but no suffix specified: Index(['Cust_id'],

➢ The on argument takes the column name found in both the

You might also like