0% found this document useful (0 votes)
14 views27 pages

Class 11-Dataframes-Part 3

This document provides an overview of manipulating dataframes in Pandas, including methods for deleting columns and rows, sorting data, handling missing values, and reading/writing CSV files. It explains various syntax and examples for dropping columns by name or index, sorting data in ascending or descending order, and filling missing values. Additionally, it covers how to import data from CSV files into dataframes and export dataframes to CSV files.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views27 pages

Class 11-Dataframes-Part 3

This document provides an overview of manipulating dataframes in Pandas, including methods for deleting columns and rows, sorting data, handling missing values, and reading/writing CSV files. It explains various syntax and examples for dropping columns by name or index, sorting data in ascending or descending order, and filling missing values. Additionally, it covers how to import data from CSV files into dataframes and export dataframes to CSV files.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CLASS 11

ARTIFICIAL INTELLIGENCE
PANDAS DATAFRAMES
Part-3
DELETING / Dropping columns from a dataframe

Syntax:
<dataframe>.drop([<name list>],axis)

Where default
Method value
1: del for axis is 0 (row) name’]
dataframename[‘col when you want a row to
be deleted
Method 2: dataframename.pop(‘col name’)
For column axis is 1 when you wish to delete a column
Method 3: dataframename.drop()
Dropping columns from a dataframe by name
• more than 1 column can be deleted
Axis has to be specified

import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product,columns=["product_code","price"])
print(df1)

X=df1.drop("product_code", axis=1) #assign it to a dataframe for permanent change


print(X)
axis=1 means column
product_code price
0 a01 100
To drop more than one column 1 a02 200
df1.drop([“product_code”, ”price”],axis=1) 2 a03 300

price
0 100
1 200
2 300
Dropping columns from a dataframe by name
import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product,columns=["product_code","price"])
print(df1)
product_code price
df1.drop("product_code", axis=1,inplace=True)
0 a01 100
print(df1)
1 a02 200
2 a03 300
inplace=True means permanent deletion from the dataframe.
By default its value is False
price
So if inplace=True is not mentioned, then the dataframe 0 100
remains unaffected. 1 200
2 300

To drop more than one column


df1.drop([“product_code”,”price”],axis=1) OR

df.drop(columns=[’hobby’,”grade”], inplace=True)
Dropping columns from a dataframe by index
import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product, columns=["product_code","price"])
print(df1)

df1.drop(df.columns[2], axis=1,inplace=True)
print(df1)

product_code price
0 a01 100
1 a02 200
2 a03 300

product_code
0 a01
1 a02
2 a03
import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product,columns=["product_code","price"])

X=df1.drop(columns=["product_code"]) #axis=1 not required


print(X)
Dropping multiple columns from a dataframe by index

import pandas as pd
grading={"Name":["rashmi","harsh","ganes","priya","vivek","anita","kartik"],"grade":[
"a1","a2","b1","a1","b2","a2","a1"],"class":["1","2","3","4","5","6","7"]}
df=pd.DataFrame(grading,index=["a","b","c","d","e","f","g"])
df.drop(labels=["grade","class"] ,axis=1,inplace=True)
print(df)

Name grade class


a rashmi a1 1 Name
b harsh a2 2 a rashmi
c ganes b1 3 b harsh
d priya a1 4 c ganes
e vivek b2 5 d priya
f anita a2 6 e vivek
g kartik a1 7 f anita
g kartik
>>>
REMOVING ROWS FROM DATAFRAME
DROPPING ROWS-using index

import pandas as pd
product={"product_code":['a01','a02','a03'],"price":[100,200,300]}
df1=pd.DataFrame(product,columns=["product_code","price"])
print(df1)

df1=df1.drop([1,2],axis=0) or df1.drop([1,2],axis=0,inplace=True)
print(df1)

By default if axis not specified it is row product_code price


As the deletion is temporary to make it 0 a01 100
a. Assign it to a variable 1 a02 200
OR 2 a03 300
a. Use inplace=True
permanent, product_code price
0 a01 100
>>>
Sorting data
SORTING DATA IN A DATAFRAME
import pandas as pd
product={"product_code":['a01','a02','a03'],"qty":[3000,4000,1000]}

df1=pd.DataFrame(product,columns=["product_code","qty"])

print(df1)

df1.sort_values(by=["qty"]) or df1.sort_values(["qty"])

print(df1) product_code qty


0 a01 3000
1 a02 4000
2 a03 1000

product_code qty
0 a01 3000
1 a02 4000
The original dataframe remains unaltered 2 a03 1000
>>>
SORTING DATA IN A DATAFRAME

import pandas as pd
product={"product_code":['a01','a02','a03'],"qty":[3000,4000,1000]}

df1=pd.DataFrame(product,columns=["product_code","qty"])
print(df1)

df1.sort_values(by=["qty"],inplace=True)

OR

df1=df1.sort_values(by=["qty“])

product_code qty
0 a01 3000
1 a02 4000
Either use inplace for permanent sorting or 2 a03 1000
Assign the dataframe to a dataframe product_code qty
2 a03 1000
0 a01 3000
1 a02 4000
>>>
SORTING DATA IN A DATAFRAME IN DESCENDING ORDER

import pandas as pd
product={"product_code":['a01','a02','a03'],"qty":[3000,4000,1000]}

df1=pd.DataFrame(product,columns=["product_code","qty"])

print(df1)

print(df1.sort_values(by=["qty“, ascending=False]))

product_code qty
0 a01 3000
1 a02 4000
2 a03 1000

product_code qty
1 a02 4000
0 a01 3000
2 a03 1000
Handling missing values
In Pandas, a missing value is denoted by NaN (Not a Number). They are various
operations which we can do related to these NaN values in our dataframes.

Consider the following dataframe to be used in all the following examples involving NaN values.
import pandas as pd
import numpy as np
dict1={'names':['sush','adarsh','ravi','manu','simar'],'phy':[34,np. NaN,56,67, np.NaN],
'chem':[78,90,np.NaN,np.NaN,np.NaN],'eng':[50,55,67,68,69],'class':[9,10,10,11,11]}
df=pd.DataFrame(dict1,index=[100,101,102,103,104])
print(df)
CHECKING FOR null or not null values in a dataframe
CHECKING FOR null or not null values in a dataframe
: CHECKING FOR notnull values in a dataframe
FILLING MISSING VALUES (NaN) WITH A PARTICULAR VALUE.

import pandas as pd
import numpy as np dict1={'names':['sush','adarsh','ravi','manu','sushma'],
'phy':[34,np.nan,56,67,np.nan],
'chem':[78,90,np.NaN,np.NaN,np.NaN],
'eng':[50,55,67,68,69],
'class':[9,10,10,11,11]}
df=pd.DataFrame(dict1,index=[100,101,102,103,104])
print(df)
CSV file handling using dataframes

 data can be imported(read) from a csv file into a


dataframe and exported(write) from a dataframe to
a csv file .
 To perform read and write operation with CSV file,
the csv module is used.
Exporting a DataFrame to a CSV file

• We can use the to_csv() function to write the


data of a DataFrame to csv file.

• dataframename.to_csv(path of the file)


Writing dataframe data to a csv file

import pandas as pd
product={"product_code":['a01','a02','a03'],"qty
":[3000,4000,1000]}
df1=pd.DataFrame(product,columns=["product_
code","qty"])
print(df1.to_csv(“d:\\abc.csv”))
READING A CSV FILE INTO A DATAFRAME

Using the read_csv() function, you can import tabular data from
CSV files into pandas dataframe by specifying a parameter value
for the file name

read_csv() is used to read the csv file with its correct path.

● sep specifies whether the values are separated by comma,


semicolon, tab, or any other character.
The default value for sep is a space.
●The parameter header marks the start of the data to be fetched.
header=0 implies that column
names are inferred from the first line of the file. By default,
header=0.
• You can create a csv file using notepad or Excel
and save as .csv file and this can be used for
extracting and processing the data
• We can also use some authentic websites,
download the csv files from there and use them
• To read the data from a csv file into a dataframe

import csv
import pandas as pd
df=pd.read_csv ('c:\\abc.csv‘)
To display data imported to a dataframe from csv file :

head()-extracts first top 5 records of the dataframe


tail()-extract bottom 5 records of the dataframe.
import pandas as pd
Import csv
df1=pd.read_csv("d:\\abc.csv")
print(df1.head())
import pandas as pd
import csv

df=pd.read_csv (‘c:\\yield_df.csv')
x=int(input("how many top records"))
print (df.head(x))

You might also like