0% found this document useful (0 votes)
5 views

Practical 3

Uploaded by

bgmi.godflex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Practical 3

Uploaded by

bgmi.godflex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Preactical 3 : Python Programming for Cleaning the Data .

1 How to print head


import pandas as pd
# Load CSV file into a DataFrame
df = pd.read_csv('path/to/your/file.csv')
# Display the first few rows of the DataFrame
print(df.head())

2 Display the Data


import numpy as np
# Load text file into a NumPy array
data = np.loadtxt('path/to/your/file.txt')
# Display the data
print(data)

3 How to open and read file


# Open and read the file
with open('path/to/your/file.txt', 'r') as file:
data = file.read()

# Display the data


print(data)

4 How to load json file


import json
# Load JSON file
with open('path/to/your/file.json', 'r') as file:
data = json.load(file)
# Display the data
print(data)

5 How to load pickle file


import pickle
# Load Pickle file
with open('path/to/your/file.pkl', 'rb') as file:
data = pickle.load(file)
# Display the data
print(data)

CLEANING DATA
1 clean empty cell
import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())

#Notice in the result that some rows have been removed (row 18, 22 and 28).

#These rows had cells with empty values.


2 Null Values

import pandas as pd
df = pd.read_csv('data.csv')
df.dropna(inplace = True)
print(df.to_string())
#Notice in the result that some rows have been removed (row 18, 22 and 28).
#These rows had cells with empty values.

3 Convert to date:
import pandas as pd
df = pd.read_csv('dta.csv')
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())

4 Remove rows with a NULL value in the "Date" column:


import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.dropna(subset=['Date'], inplace = True)
print(df.to_string())

5 Replacing Values

import pandas as pd
df = pd.read_csv('data.csv')
df.loc[7,'Duration'] = 45
print(df.to_string())

To replace wrong data for larger data sets


import pandas as pd
df = pd.read_csv('data.csv')
for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120
print(df.to_string())

Removing Duplicates
import pandas as pd
df = pd.read_csv('data.csv')
print(df.duplicated())
example
import pandas as pd
df = pd.read_csv('data.csv')
df.drop_duplicates(inplace = True)
print(df.to_string())

#Notice that row 12 has been removed from the result

You might also like