We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 16
Unit
ata Handling using
\das and Data Visualization Visit to website
tips Juno learnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
In order to access a dataframe with a Boolean index, we have to create a dataframe
in which index of dataframe contains a Boolean value that is “True” or “False”
For Example:
ff Created by: Amjad Khan (06.06.2020)
l# Creeate DataFrame with Boolean Index
limport pandas as pd
jsrec={'sid':[101,102, 103,104,105, 106,107,108,109,110],
‘sname':['Amit', 'Sumit', 'Aman', 'Rama', 'Neeta', 'Amjad',
"Ram', 'Ilma', 'Raja', 'Pawan',],
*smarks': (98, 67,85, 56,38, 98, 67,28, 56,81],
‘sgrade':['A1','B2','A1', 'Cl', "D', "Al", 'B2', "E', 'B2", 'A2"],
tremark':['P',"P','P', 'E", "P,P", "Pt, "EY, 'P', 'P']
}
# Convert the dictionary into DataFrame
|df=pd.DataFrame (Srec)
jt Without Boolean Index display
print ("\n-# Without Boolean Index display-\n")
[print (df)
laf = pd.DataFrame(Srec, index = [True, False, True, False,
True,False, True, False, False, True])
l# with Boolean Index display
print ("\n-With Boolean Index display-\n")
int (df)
-With Boolean Index display-—
sid sname smarks sgrade remark
IiTrue 101 Amit 98 Al PB
False 102 Sumit 67 B2 PB
True 103 Aman 85 Al PB
False 104 Rama 56 cL EF
True 105 Neeta 38 D PB
False 106 Amjad 98 Al P
True 107 Ram 67 B2 Pe
False 108 Ilma 28 E EF
False 109 Raja 56 B2 Pp
[True 110 Pawan sl A2 Pp
Page tof 16Unit 1: Data Handling using Pandas and Data Visualization Visit to website: https /uwwrleamnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
This example shows the working of how to access the DataFrame with a Boolean
index by using .loc[ ] We simply pass a Boolean value (True or False) in
a .loc{] function. Here | passed ‘True’, So it displays only True’s indices data.
# loc methods takes only integers
# so,we are passing True insted of index no.
# accessing using .loc(True)
print ("\n-accessing using .loc(True)-\n")
print (df.loc[True])
-accessing using .loc(True)-
sid sname smarks sgrade remark
True 101 Amit 98 Al P
True 103 Aman 85 Al PB
True 105 Neeta 38 D P
True 107 Ram 67 B2 P
True 110 Pawan 81 A2 P
In order to access a dataframe using .iloc[ ], we have to pass a Boolean value (True
or False) in a ilocf ] function. Here | passed two indices (1, 4), you can pass single
c[2])
# iloc methods takes only integers
# so, we are passing 1,4 insted of True.
# accessing using .iloc[[(1,4)]]
print ("\n-accessing using .iloc[[{(1,4)]]-\n")
print (df.iloc[[1,4]])
-accessing using .iloc[[(1,4)]]-
index for example: print(df.
sid sname smarks sgrade remark
False 102 Sumit 67 B2 P
True 105 Neeta 38 D PB
Page 2 of 16Unit 1: Data Handling using Pandas and Data Visualization Visit to website: https /uwwrleamnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
In this tutorial, you'll learn how and when to combine your data in
Pandas with:
* Join() for combining data on a key column or an index
* merge() for combining data on common columns or indices
* .concat() for combining DataFrames across rows or columns
Oke etki
In order to join dataframe, we use .concat() function this function concat
a dataframe and returns a new dataframe.
uuter’, join_axes=None, ignore_index-False,
pd.concat(objs, axis=0, joi
keys=None, levels=None, names=None, verify_integrity=False, copy=True)
1, Important Parameters for concat( ) method :
objs: a sequence of pandas objects to concatenate.
axis: default is 0 i.e. row-wise concatenation. If axis=1, then column-
wise concatenation is performed
keys: assigning keys to create the multi-index. It's useful in marking the
source objects in the output.
iv.ignore_index: if True, the source objects indexes are ignored and
0,1,2..n indexes are used in the output.
Vv. join: optional parameter to define how to handle the indexes
onthe other axis. The valid values are ‘inner’ and ‘outer’.
Page 3 of 16Unit
ata Handling using Pandas and Data Visualization Visit to website: https /uwwrleamnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
# importing pandas module eer
import pandas as pd
2)
# Define a dictionary containing student data
recl = {'sroll':['s0', 'sl', 'S2', 'S3"],
"sname':['Kamal', 'Princ', 'Gagan', 'Amit'],
"sage':(15, 14, 16, 17],
'marks': (78,90, 76,561}
# Define a dictionary containing student data
rec2 = {'sroll':['s4", 's4', 'sé', 'S7"'],
"sname':('"Kimmi', 'Pranav', 'Pankaj', 'Sumit'],
"sage':[13, 15, 17, 18], Sroll sname sage marks
‘marks':[78,90,76,56]} jo = SO Kamal «15 78
Sl Princ «61490
datafl = pd.DataFrame(recl) 2 $2 Gagan 16 76
Bs? amit 17 56
dataf2 = pd.DataFrame(rec2) sroll sname sage marks
0 s4 Kimmi 13 78
print (datafl, "\n\n", dataf2) 1 s4 Pranav 15 90
2 «$6 Pankaj «91776
3s? suit 18 56
Example 1:
# Code 1: Now we apply .concat function in order to
#concattwo dataframe along the rows
# using a .concat() method
print ("\n-Concate datafl and dataf2 using .concat() along”)
print ("row wise-\n")
frames=[datafl,dataf2]
resi = pd.concat (frames)
# OR
#resl = pd.concat ([dataf1,dataf2])
print (res1)
Page 4 of 16Unit
ata Handling using Pandas and Data Visualization Visit to website: https /uwwrleamnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
Concate datafl and dataf2 using
.concat() along row wise-
sroll sname sage marks
sO Kamal 15 78
sl Princ 14 90
s2 Gagan 16 76
s3 amit 17 56
s4 Kimmi 13 78
s4 Pranav 15 90
s6 Pankaj 17 76
s7 sumit 18 56
Example 2:
Pandas also provide you with an option to label the DataFrames, after the
concatenation, with a key so that you may know which data came from which
DataFrame. You can achieve the same by passing additional argument keys
specifying the label names of the DataFrames in a list. Here you will perform the
same concatenation with keys as X and Y for DataFrames datafl and dataf2
respectively.
print ("\n-Label the DataFrames,after the concatenation-\n")
res2 = pd.concat (frames, keys=['X", 'Y"])
print (res2)
-Label the DataFrames,after the concatenation-
sroll sname sage marks
CCRC Ls xO. SO Kamal 15 78
PAE eLy] 1 Sl Princ 14 90
2 s2 Gagan 16 76
pra) 30 $30 Amit’. 17 56
a yo s4 Kimi 13 78
1 84 Pranav 15 90
2 Sé Pankaj 17 76
3s? Sumit 18 56
Page Sof 16Unit 1: Data Handling using Vist to website: https: /www learnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
Example 3:
# Code
print ("\n-concat datafl and dataf2 using .concat() along column wise-\n")
res2 = pd.concat ((datafl,dataf2], axis=1)
[print (res2)
|concat datafl and dataf2 using .concat() along column wise-
sroll sname sage marks sroll sname sage marks
0 sO Kamal 15 78 s4 Kimmi 13 78
h si Princ 14 90 $4 Pranav 15 90
kk s2 Gagan 16 76 $6 Pankaj 17 76
3 S3_ Amit 17 56 s7 Sumit 18 56
Data of Data Frame ~detaf Data of Data Frame - datat2
after concatenate after concatenate
Tuy ae Li A cura
DBI It will combine all the columns from the two tables or DataFrames, with the
common columns.
a 2
left join Fight join inner join outer join
@ ® 9 @®
Page 6 of 16Unit 1: Data Handling using Pandas and Data Visualization Visit to website: https /uwwrleamnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
(WEREEUE Pandas DataFrame merge() function is used to merge two DataFrame
objects with a database-style join operation. The joining is performed on columns or
indexes.
Merging dataframe using how in an argument:
* We use how argument to merge specifies how to determine which keys are to be
included in the resulting table.
* Ifa key combination does not appear in either the left or right tables, the values
in the joined table will be NA.
* Here is a summary of the how options and their SQL equivalent names:
inner INNER JOIN Use intersection of keys from both
frames
Ta] p= ale]
fi i 2
o l J
2 inner join. 1 outer join
_. ap Teper]
outer join =
= i
fe a]
Page 7 of 16Unit Visit to website: https /uwwrleamnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
Example 1: Simple merging a DataFrames with one unique key combination
Suppose we have two data frames where, SRoll is a unique key:
# using .merge() function
# we are using .merge() with one unique key combination
print ("\n-we are using .merge() with one unique")
print ("key combination-\n")
datares = pd.merge(dataf1, dataf2, on="SRoll')
print (datares)
Dataf1: left Dataf2: right
SRoll sname sage Roll Sadd sclass
0 sO Kamal 15 0 so Tkd x
1 Sl Princ 14 1 S1_ Khanpur 1x
2 s2 Gagan 16 2 s2. Kalkaji XI
3.83. amit. 17 3.83 Gk-IT__—XIT
-we are using .merge() with one unique
key combination-
SRoll sname sage Sadd sclass
0 sO Kamal 15 Tkd x
1 Sl Prine 14 Khanpur Ix
2 $2 Gagan 16 Kalkaji xI
3.83 amit —17_—Gk-II_—siXTI
Example 2: Merging dataframe using multiple join keys. Multiple keys (SRoll1 &
SRoll2)
# using .merge() function
# we are using .merge() with multiple key combination
print ("\n-we are using .merge() with multiple key")
print ("combination-\n")
datares = pd.merge(datafl, dataf2, on=['SRoll1', 'SRol12"])
print (datares)
Page 8 of 16Unit 1: Data Handling using Vist to website: https: /www learnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
datafa: left dataf ht
SRollI SRol12 ‘sname “sage SRol11 SRol12 Sadd sclass
0 sO sO Kamal 15 0 so 80 tka x
i sl Sl Princ 14 2 $2 $0 Khenpur
alkaji
2 Se SO Gagan 16 3.0 $3 SO Gk-Ir.— XI
3 83 sl amit 17
—we are using .merge() with multiple key
combination-
SROll1 SRoll2 sname sage Sadd sclass
° so sO Kamal 15 Tka x
1 sS2 SO Gagan 16 Kalkaji xI
Example 3: Now we set how = ‘left! in order to use keys from left frame only. Using
keys from left frame :
fcode 3:Now we set how = ‘left' in order to use keys from left frame only.
fusing keys from left frame
print("\n-using keys from left frame-\n")
res = pd.merge(datafl, dataf2, how="left', on=['SRoll1', 'SRol12"])
print (res)
datafi: left dataf2: right
SRolll SRoll2 sname sage SRoll1 SRol12 Sadd sclass
lo so sO Kamal 15 0 so so kd x
1 s1 sl Princ 14 1 Sl SO Khanpur x
> 82 $0 Gagan 16 2 82 SO Kalkaji XI
5 Ss Sones oo 3 83 80. Gk-TT. XII
Left join
-using keys from left frame-
SRoll1 SRol1l2 sname sage Sadd sclass
° so sO Kamal 15 Tka x
1 sl Sl Princ 14 NaN NaN
2 s2 sO Gagan 16 Kalkaji xI
3 s3 Sl amit 17 NaN NaN
Page 9 of 16Unit
sndas and Data Visualization Visit to website: https /uwwrleamnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
Example 4: Now we set how = ‘right’ in order to use keys from right frame only.
Using keys from right frame
fcode 4: Now we set how = 'right' in order to use keys from right frame only.
fusing keys from right frame
print ("\n-using keys from right frame-\n")
res = pd.merge(datafi, dataf2, how='right', on=|
print (res)
"sRolli', 'sRol12"])
dataf1: left dataf2: right
SROIIT SROl12” sname sage SRollI SRoll2 Sada sclass
o so sO Kamal 15 0 so so Tkd x
1 sl sl Princ 14 SL 80 Rhanpar rx
2 s2 sO Gagan 16 5 50 Kalkaji xz
5 3 Sloat oD 3. 83-80. Gk-IT_—xIT
Right Join
-using keys from right frame—
SRoll1 SRol12 sname sage — Sadd sclass
0 so sO Kamal 15.0 Tkd x
1 82. $0 Gagan 16.0 Kalkaji XI
2 sl sO NaN NaN Khanpur =X
3 s3 so NaN NaN Gk-II XII
Example 5: Now we set how = ‘outer’ in order to get union of keys from dataframes.
f#code 5: Now we set how = ‘outer’ in order to get union of keys
from dataframes.Using keys from outer frame
print ("\n# getting union of keys\n")
res = pd.merge(dataf1, dataf2, how="outer', on=['sRolli', 'SRol12"])
print (res)
datafl: left dataf2: right
SROl11 SRO112 Sada sclass SRoll1 SROll2_Sadd sclass
o 800 Tk x oso. 80 tka x
1 Sl SO. Khanpur pg 1 sl SO Khanpur x
2 82 80 Kalkaji XI 2 82 80 Kalkaji XI
3. S30 SO. Gk-IT_—XIT 3. S30 SO. Gk-IE.—XIT
Page 10 0f 16,Unit 1: Data Handling using Vist to website: https: /www learnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
Outer Join
# getting union of keys
SRoll1 SRol12 sname sage Sadd sclass
° so sO Kamal 15.0 Td x
1 sl sl Princ 14.0 NaN NaN
2 s2 SO Gagan 16.0 Kalkaji XI
3 3 Sl amit 17.0 NaN NaN
4 si SO NaN NaN Khanpur Ix
5 s3 SO NaN NaN Gk-IT__XIT
Example 6: Now we set how = ‘inner’ in order to get intersection of keys from
DataFrames.
#Code 6: Now we set how = 'inner' in order to get intersection of
#keys from dataframes. Using keys from right frame
print ("\n-getting intersection of keys-\n")
res = pd.merge(datafl, dataf2, how-'inner', on=['SRoll1', 'SRol12"])
print (res)
dataf1: left dataf2: right
SROl11 SRO112 Sada sclass SROll1 SROl12_-Sadd sclass
oso SO Tkd x oso. so Tkd x
1 si sO Khanpur x 1 st sO Khanpur x
2 82 80 Kalkaji XI 2 s2 80 Kalkaji XT
3 $3 «S80. Gk-I._—sXIT. 3 $3 80. Gk-IT._—XIT
Inner Join
-getting intersection of keys-
SRolll SRoll2 sname sage Sadd sclass
o so so Kamal 15 Tkd x
1 s2 S0_Gagan__16 Kalkaji XI
For more reference on join, merge and concat
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging. htm!
Page 11 0f 16,Data Handling using Pandas and Data Visualization Visit to webs
-si/ [scons Jearnpythondcbse.com
Chapter-4 Data Handling Using Pandas-I
The CSV (Comma Separated Values) format is one of the most
popular file formats used to store and transfer data between
different programs. Currently, many database management tools
and the popular Excel offer data import and export in this format.
The CSV file is a plain text file with the .esv extension. A typical
file contains comma-separated values, but other separators such
as semicolon or tab are also allowed. It should be emphasized
that only one type of separator can be used in one CSV file.
Each line in the file represents a certain set of data. Optionally, in
the first line we can put a header that describes this data. Let's
look at a simple example of a file called details.csv that stores
contacts from a phone:
Name,Phone
mother,9990004561
father,9985672340
wife,9898234580
mother-in-law,0920486745
Page 12 0f 16,Visit to webs
Ttpe Jonnw hondcbse.com
Chapte Data Handling Using Pandas-I
In the above file, there are four contacts consisting of name and phone
number. Note that the first line contains a header to help you interpret
the data.
To create a CSV file using Notepad and MS Excel
Steps to create CSV file using Notepad
1. Start Notepad. Create a table with three records, where each record
has two fields. For example, type “mother,999000451” (without
quotation marks) on the first line, “father,9985672340” on the second
line and “wife,989823480” on the third line.
2. Open the “File” menu and select “Save As.” In the File Name box, type
a file name that ends with a CSV extension. For example, type
“contacts.csv.”
3, Click the “Save as Type” drop-down list and select “All Files.” Click
"Save." Test the file by opening it inside a spreadsheet.
Steps to create CSV file using Notepad
1, Start Microsoft Excel and add data to a new spreadsheet. For example,
type “32,” “19” and “8” in cells “A1,” “A2” and “A3,” respectively.
2. Click the “File” tab on the ribbon and then choose “Save As.” Click the
arrow next to “Save as Type” and choose “CSV (Comma Delimited)” from
the drop-down list.
3. Change the file name to one you prefer. Select the location to save the
file, then click the “Save” button. Click "OK" to save only the active sheet.
Click "Yes" to save the file in CSV format.
Page 13 of 16,Unit 1: Data Handling using Pandas and Data Visualization Visit to website: https /uwwrleamnpythonacbse.com
Chapter-4 Data Handling Using Pandas-I
Suppose we have a ‘Studentdetails.csv’ file
201 Amit
208 Rama
105 Neeta
4106 Amjad
209 Raja
1310 Pawan
1123 Sudhir
Method
Using the read_csv() function from the pandas package, you can import tabular
data from CSV files into pandas dataframe by specifying a parameter value for the
file name (e.g. pd.read_csv("filename.csv”)).
Remember that you gave pandas an alias (pd), so you will use pd to
calll pandas functions. Be sure to update the path to the CSV file to your home
directory.
(f Created By: Amjad Khan
|fImporting/Exporting Data between csv files and Data Frames.
import pandas as pd # importing pandas module
limport csv # import the module csv
""*Method 1:Importing csv file (using read_csv()method'''
lt Import csv file and making data frame
f1 = pd.read_csv(r"D: /Amjad_Pandas_Programs/StudentDetails.csv")
vint (df1.nead(5))
ge 14 of 16Unit
Chapter-4 Data Handling Using Pandas-I
ndas and Data Visualization Visit to website:
Tips Junnw learnpythondcbse.com
lt created 8
ftmporting/Exporting Data between csv files and pata Frames.
jimport pandas as pd —# importing pandas module
ort csv
"*'Method 2:Importing CSV file (Using csv.reader() module) ***
lt open the csv file
open (r"D: /Amjad_Pandas_Programs/BoolStudentDetails.csv") as csv_file:
4 read the csv file
csv_reader = csv.reader(csv_file, delimiter=',")
# now we can use this csv files into the pandas
£2 = pd.DataFrame([csv_reader], index=None)
df2-head()
lt iterating values of ten column a
for i in range (10):
for val in list (df2(i]):
print (val)
‘Amjad Khan
# import the module csv
jae3
lt Created By: Amjad Khan
lt Importing/Exporting Data between CSV files and Data Frames.
import pandas as pd # importing pandas module
'"*Exporting/Saving a Pandas Dataframe as a CSV''!
l#Method 1:Save csv to working directory.
lf list of name, degree, score
Ime = ("aparna", "pankaj", "sumit", "Geeku"]
deg = ("XT") "R", "XII", "X"
lscr = [90, 40, 80, 98]
lt dictionary of lists
dict = {tname': nme, 'degree': deg, ‘score’: scr}
= pd.DataFrame (dict)
# saving the dataframe
af3.to_csv('filel.csv')
Page 15 of 16\das and Data Visualization Visit to website: https /uwwrleamnpythonacbse.com
Unit
ata Handling using
Chapter-4 Data Handling Using Pandas-I
# Created By: Amjad Khan
lt Importing/Exporting Data between CSV files and Data Frames.
limport pandas as pd # importing pandas module
lfMethod 2: Saving CSV without headers and index.
# list of name, degree, score
nme = ["aparna", “pankaj", "sumit”
ldeg = ["XI", "X", "RIT", "X"]
lscr = (90, 40, 80, 98]
"Geeku
lt dictionary of lists
dict = {"name': nme, 'degree': deg, 'score': scr)
af4 = pd.Datarame (dict)
# saving the dataframe
laf4.to_csv('file2.csv', header=False, index=False)
jt Created By: Amjad Khan
jt Importing/Exporting Data between CSV files and Data Frames.
import pandas as pd # importing pandas module
lt Method 3: Save csv file to a specified location.
jt list of name, degree, score
Ime = ["aparna", "panka: sumit", "Geeku"]
ldeg = ("XI") "XK", "XII", "X"]
cr = [90, 40, 80, 98]
lt dictionary of lists
fdict = {"name': nme, 'degree': deg, 'score': scr}
|af5 = pd.DataFrame (dict)
jt saving the dataframe
[a£5.to_csv(r'C:\Users\Admin\Desktop\file3.csv', index=False)
ge 16 of 16