0% found this document useful (0 votes)
3 views1 page

Day-5 Ai Code

The document provides a Python program that cleans a dataset by removing duplicate entries and filling missing values with the mean of their respective columns. It includes a sample dataset with duplicates and missing values, demonstrating the cleaning process using the pandas library. The output shows the original and cleaned datasets side by side.

Uploaded by

suhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views1 page

Day-5 Ai Code

The document provides a Python program that cleans a dataset by removing duplicate entries and filling missing values with the mean of their respective columns. It includes a sample dataset with duplicates and missing values, demonstrating the cleaning process using the pandas library. The output shows the original and cleaned datasets side by side.

Uploaded by

suhani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

3/5/25, 6:11 PM Untitled2.

ipynb - Colab

format_size format_bold format_italic code link image format_quote format_list_numbered format_list_bulleted horizontal_rule ψ mood
.Write a Python program to clean a dataset by removing duplicate ent .Write a Python program to clean a dataset by removing duplicate
filling missing
entries and filling missing values with the mean of the respective
values with the mean of the respective column.
column.

# Import required libraries


import pandas as pd
import numpy as np

# Sample dataset with duplicates and missing values


data = {
'ID': [1, 2, 3, 4, 5, 2, 6], # Duplicate ID = 2
'Age': [25, 30, 35, np.nan, 45, 30, np.nan], # Missing values in Age
'Salary': [50000, 60000, np.nan, 80000, 90000, 60000, 70000] # Missing Salary
}

# Create a DataFrame
df = pd.DataFrame(data)

print("Original Dataset:")
print(df)

# Remove duplicate entries based on all columns


df = df.drop_duplicates()

# Fill missing values with column mean


df.fillna(df.mean(numeric_only=True), inplace=True)

print("\nCleaned Dataset:")
print(df)

Original Dataset:
ID Age Salary
0 1 25.0 50000.0
1 2 30.0 60000.0
2 3 35.0 NaN
3 4 NaN 80000.0
4 5 45.0 90000.0
5 2 30.0 60000.0
6 6 NaN 70000.0

Cleaned Dataset:
ID Age Salary
0 1 25.00 50000.0
1 2 30.00 60000.0
2 3 35.00 70000.0
3 4 33.75 80000.0
4 5 45.00 90000.0
6 6 33.75 70000.0

Start coding or generate with AI.

https://colab.research.google.com/drive/1oavejkqUovkIr12hdg44TYQOszJqwIyR#scrollTo=WvmINz-oUrFa&printMode=true 1/1

You might also like