0% found this document useful (0 votes)
2 views

Chapter 2 Python Pandas - II

Chapter 2 of the document focuses on advanced operations with Python Pandas, particularly dataframes, including iterating over rows and columns, performing binary operations, and descriptive statistics. It covers functions such as iterrows(), iteritems(), and various statistical functions like mode(), mean(), and median(). Additionally, it explains methods for combining dataframes using concat(), join(), and merge(), along with examples and homework questions for practice.

Uploaded by

mainshabhatnagar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 2 Python Pandas - II

Chapter 2 of the document focuses on advanced operations with Python Pandas, particularly dataframes, including iterating over rows and columns, performing binary operations, and descriptive statistics. It covers functions such as iterrows(), iteritems(), and various statistical functions like mode(), mean(), and median(). Additionally, it explains methods for combining dataframes using concat(), join(), and merge(), along with examples and homework questions for practice.

Uploaded by

mainshabhatnagar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter-2

Python Pandas – II
Introduction – In this chapter we shall talk more about dataframes, basic
operations of dataframe, descriptive statistics, pivoting, handling missing data,
combining/merging etc.
Iterating Over a DataFrame
iterrows() – Iterates over dataframe row-wise where each horizontal subset
is in the form of (row index, Series) where Series contains all column values for
that row-index.
Example 1 : Write a program to print the DataFrame df. One row at a time.
Import pandas as pd
dict = {'Name':['Ram','Mohan','Sachin'],
'Marks':[95,88,89]}
df = pd.DataFrame(dict, index = ['Rno 1','Rno 2','Rno 3'])
print(df)

Ans :

iteritems() function – This function return vertical subset form a dataframe


in the form of column index and a series object containing values for all rows
in that column.
Example 2 : Write a program to print the DataFrame df. One column at a time.

Ans :

Print Specific Column from a Row :


Syntax :
for r, row in df.iterrows():
Row[<Column name>]
Example 3 : Write a program to print only the value from marks column, for each
row.
Binary Operations in a DataFrame :
Binary operations means operations requiring two values to perform and these
values are picked elementwise.
• For matching row, column index the given operation is performed.
• For nonmatching row, column indexes NaN value is stored in the result.

Consider the following dataframes (df1, df2, df3, df4) :


Addition Binary operation using + , add() and radd() :

Note – 1. NaN result for non-matching row or column.


2. If a column contain NaN value then datatype of that column will be changed in
float even values of that column in the form of integer.
Subtraction Binary operation using - , sub() and rsub() :

Multiplication Binary operation using * and mul() :

Division Binary operation using * and div() :


Descriptive Statistics with Pandas :
Pandas also include many useful statistical functions :-

Reference dataframe is given below :-


import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
dict = {'Fruits':[7830.0,11950.0,113.1,7152.0,44.1,24169.2],
'Pulses':[931.0,818.0,1.7,33.0,23.2,2184.4],
'Rice':[7452.4,1930.0,2604.8,11856.2,814.6,13754.0],
'Wheat':[np.nan,2737.0,np.nan,16440.5,0.5,30056.0] }
prodf = pd.DataFrame(dict, index = ['Andhra P.','Gujrat','kerala','Punjab',
'Tripura', 'Uttar P.'])
print(prodf)

Functions min() and max() –


Finding minimum and maximum value column-wise:

Finding minimum and maximum value row-wise

Consider following DataFrame dfmks

dict = {'A' :[99,90,95,94,97],


'B':[94.0,94.0,89.0,np.NaN,100.0],
'C':[92,92,91,99,99],
'D':[97.0,97.0,89.0,95.0,np.NaN]}
dfmks = pd.DataFrame(dict, index = ['Acct','Eco','Eng','IP','Math'])
print(dfmks)

Example : Consider the DataFrame (dfmks) given above. Write a program to print
the maximum marks scored in each subject across all sections.
Example : Consider the DataFrame (dfmks) given above. Write a program to print
the maximum marks scored in a section, across all subject.

Index of Maximum and Minimum Values : idxmax() and idxmin() :-


idxmax() - To find maximum index.
idxmin() – To find minimum index.

Function mode(), mean() and median() –

Function mode() , mean() and median() are common statistics functions.

mode() – returns the mode value (i.e. the value that appears most often in a
given set of values).

mean() – returns the computer mean (average) from a set of values.


median() – returns the middle number from a set of numbers. It returns the
median value that separates the higher half from the lower half of a set of values.

Consider following dataframe :

now calculate mode , median and mean.

(1) Calculating mode - mode() :

(2) Calculating median - median() :


(3) Calculating mean – mean() –

Consider dataframe mksdf

lst1 = [99,94,95,94,97]

lst2 = [94,94,89,87,100]

lst3 = [92,92,91,99,99]

lst4 = [99,97,89,94,99]

dict = {'A':lst1,'B':lst2,'C':lst3,'D':lst4}

mksdf = pd.DataFrame(dict)

mksdf.index = ['Acct','Eco','Eng','IP','Math']

print(mksdf)

Homework –

1. Name the function to iterate over a DataFrame horizontally.

2. Name the function to iterate over a DataFrame vertically.

3. Is the result of sub() and rsub() the same? Why/why not ?

4. What are Binary operations ? Name the function that let you perform
binary operation on a DataFrame.

5. Write a program to print a DataFrame one column at a time and print


only first three columns.
Combining DataFrames :

Combining DataFrames using concat() : This method is useful if the


two dataframes have similar structures.

df1 = pd.DataFrame({'Sub_id':[1,2,3,4,5],

'Fname':['amit','ajay','vikas','vaibhav','jia'],

'Lname':['shukla','tiwari','madan','parmar','jain']})

df2 = pd.DataFrame({'Sub_id':[4,5,6,7,8],

'Fname':['anil','rita','akshay','kapil','ms'],

'Lname':['gupta','sharma','sinha','dev','dhoni']})

df3 = pd.DataFrame({'Sub-id':[1,2,3,4,5,7,8,9,10,11],

'Test-id':[51,15,15,61,16,14,15,1,61,16]})
By default concat() concatenate along the row. To concatenate along the
column we can give argument axis = 1.

Combining DataFrame using join() : Basically create a dataframe from


two dataframes by joining their rows.

1. Inner Join – Take the rows having common indexes from both the
dataframes.

df1 = pd.DataFrame(

{'Name':['Khushboo','Prarthana','Aman','Kamal']},

index = [1,2,3,4])

df2 = pd.DataFrame({'Marks':[19,18,12]}, index = [1,3,7])


2. Left join – Take all the rows from the left (first) dataframe and join with
it only those rows from the second dataframe that have common indexes
as dataframe 1. It is default join.

3. Right join – Take all the rows from the right (second) dataframe and
join with it only those rows from the first dataframe that have common
indexes as dataframe 2.

4. Outer join – Take all rows from both the dataframe and join them.
Joining on a Column : we can provide column name of dataframe 1 with
on argument of join method.

Df1.join(Df2, on = <column name of Df1>)

When we specify the on argument – the left dataframe’s mentioned


column’s values will be matched with indexes of the second
dataframe.

Consider following dataframe named Df1 and Df2 :

Df1 = pd.DataFrame({'Cust_id':[1,2,3,4,5,6],

'Product':['Oven','AC','AC','Speaker','Tablet','Smartphone']})

Df2 = pd.DataFrame({'P_id':[2,4,6],

'State':['Delhi','Goa','kerala']

})
Now, change the column name from P_id to Cust_id :

Now join :

>>> Df2.join(Df1, on = 'Cust_id')

ValueError: columns overlap but no suffix specified: Index(['Cust_id'],


dtype='object')
Combining Dataframe using merge() : To combine two dataframes such
that two rows with some common values are merged together in the final
result. We can specify the field on the basis of which we want to combine
the two datafames.

Where :

➢ The on argument takes the column name found in both the


dataframes.
➢ Default is ‘inner’ join for merge().
➢ If we skip the argument on = <field_name>, then it will take any
merge on common fields.

You might also like