Class 11-Dataframes-Part 3
Class 11-Dataframes-Part 3
ARTIFICIAL INTELLIGENCE
PANDAS DATAFRAMES
Part-3
DELETING / Dropping columns from a dataframe
Syntax:
<dataframe>.drop([<name list>],axis)
Where default
Method value
1: del for axis is 0 (row) name’]
dataframename[‘col when you want a row to
be deleted
Method 2: dataframename.pop(‘col name’)
For column axis is 1 when you wish to delete a column
Method 3: dataframename.drop()
Dropping columns from a dataframe by name
• more than 1 column can be deleted
Axis has to be specified
import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product,columns=["product_code","price"])
print(df1)
price
0 100
1 200
2 300
Dropping columns from a dataframe by name
import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product,columns=["product_code","price"])
print(df1)
product_code price
df1.drop("product_code", axis=1,inplace=True)
0 a01 100
print(df1)
1 a02 200
2 a03 300
inplace=True means permanent deletion from the dataframe.
By default its value is False
price
So if inplace=True is not mentioned, then the dataframe 0 100
remains unaffected. 1 200
2 300
df.drop(columns=[’hobby’,”grade”], inplace=True)
Dropping columns from a dataframe by index
import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product, columns=["product_code","price"])
print(df1)
df1.drop(df.columns[2], axis=1,inplace=True)
print(df1)
product_code price
0 a01 100
1 a02 200
2 a03 300
product_code
0 a01
1 a02
2 a03
import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product,columns=["product_code","price"])
import pandas as pd
grading={"Name":["rashmi","harsh","ganes","priya","vivek","anita","kartik"],"grade":[
"a1","a2","b1","a1","b2","a2","a1"],"class":["1","2","3","4","5","6","7"]}
df=pd.DataFrame(grading,index=["a","b","c","d","e","f","g"])
df.drop(labels=["grade","class"] ,axis=1,inplace=True)
print(df)
import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product,columns=["product_code","price"])
print(df1)
df1=df1.drop([1,2],axis=0) or df1.drop([1,2],axis=0,inplace=True)
print(df1)
df1=pd.DataFrame(product,columns=["product_code","qty"])
print(df1)
df1.sort_values(by=["qty"]) or df1.sort_values(["qty"])
product_code qty
0 a01 3000
1 a02 4000
The original dataframe remains unaltered 2 a03 1000
>>>
SORTING DATA IN A DATAFRAME
import pandas as pd
product={"product_code":['a01','a02','a03'],"qty":[3000,4000,1000]}
df1=pd.DataFrame(product,columns=["product_code","qty"])
print(df1)
df1.sort_values(by=["qty"],inplace=True)
OR
df1=df1.sort_values(by=["qty“])
product_code qty
0 a01 3000
1 a02 4000
Either use inplace for permanent sorting or 2 a03 1000
Assign the dataframe to a dataframe product_code qty
2 a03 1000
0 a01 3000
1 a02 4000
>>>
SORTING DATA IN A DATAFRAME IN DESCENDING ORDER
import pandas as pd
product={"product_code":['a01','a02','a03'],"qty":[3000,4000,1000]}
df1=pd.DataFrame(product,columns=["product_code","qty"])
print(df1)
print(df1.sort_values(by=["qty“, ascending=False]))
product_code qty
0 a01 3000
1 a02 4000
2 a03 1000
product_code qty
1 a02 4000
0 a01 3000
2 a03 1000
Handling missing values
In Pandas, a missing value is denoted by NaN (Not a Number). They are various
operations which we can do related to these NaN values in our dataframes.
Consider the following dataframe to be used in all the following examples involving NaN values.
import pandas as pd
import numpy as np
dict1={'names':['sush','adarsh','ravi','manu','simar'],'phy':[34,np. NaN,56,67, np.NaN],
'chem':[78,90,np.NaN,np.NaN,np.NaN],'eng':[50,55,67,68,69],'class':[9,10,10,11,11]}
df=pd.DataFrame(dict1,index=[100,101,102,103,104])
print(df)
CHECKING FOR null or not null values in a dataframe
CHECKING FOR null or not null values in a dataframe
: CHECKING FOR notnull values in a dataframe
FILLING MISSING VALUES (NaN) WITH A PARTICULAR VALUE.
import pandas as pd
import numpy as np dict1={'names':['sush','adarsh','ravi','manu','sushma'],
'phy':[34,np.nan,56,67,np.nan],
'chem':[78,90,np.NaN,np.NaN,np.NaN],
'eng':[50,55,67,68,69],
'class':[9,10,10,11,11]}
df=pd.DataFrame(dict1,index=[100,101,102,103,104])
print(df)
CSV file handling using dataframes
import pandas as pd
product={"product_code":['a01','a02','a03'],"qty
":[3000,4000,1000]}
df1=pd.DataFrame(product,columns=["product_
code","qty"])
print(df1.to_csv(“d:\\abc.csv”))
READING A CSV FILE INTO A DATAFRAME
Using the read_csv() function, you can import tabular data from
CSV files into pandas dataframe by specifying a parameter value
for the file name
read_csv() is used to read the csv file with its correct path.
import csv
import pandas as pd
df=pd.read_csv ('c:\\abc.csv‘)
To display data imported to a dataframe from csv file :
df=pd.read_csv (‘c:\\yield_df.csv')
x=int(input("how many top records"))
print (df.head(x))