0% found this document useful (0 votes)
21 views4 pages

22IZ023 Nikhil - Exercise 5 - Data Preprocessing

Uploaded by

nikhildeutsch03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views4 pages

22IZ023 Nikhil - Exercise 5 - Data Preprocessing

Uploaded by

nikhildeutsch03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Exercise-5

Aim​
To perform data preprocessing techniques such as handling missing values,
standardization, and normalization using Python.

Logic Description​
Data preprocessing involves cleaning and transforming raw data to improve its
quality for analysis. This includes handling missing values, standardizing data
distributions, and normalizing data scales.

Algorithm

1.​ Load the dataset using Pandas.


2.​ Identify and handle missing values using mean and median imputation.
3.​ Remove rows with missing values.
4.​ Standardize the "Price" column using StandardScaler.
5.​ Normalize the "Price" column using MinMaxScaler.
6.​ Save the processed datasets to CSV files.

Package/Tools Description

SI.N Name of Description


O Package/Tool

1 Pandas Data manipulation and analysis

2 Scikit-learn Machine learning library for


preprocessing

Source Code:-
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Step 1: Load the dataset


df = pd.read_csv('D:/123.csv')
# Step 2: Identify missing values in each column
print("Missing Values:\n", df.isnull().sum())

# Step 3: Create df_dropna by removing rows with missing values


df_dropna = df.dropna()
print("\nDataset after dropping missing values:\n", df_dropna.head())

# Step 4: Fill missing values in numeric columns using mean


imputer_mean = SimpleImputer(strategy="mean")
df_mean_filled = df.copy()
df_mean_filled[["Age", "Price"]] =
imputer_mean.fit_transform(df_mean_filled[["Age", "Price"]])

# Step 5: Fill missing values in numeric columns using median


imputer_median = SimpleImputer(strategy="median")
df_median_filled = df.copy()
df_median_filled[["Age", "Price"]] =
imputer_median.fit_transform(df_median_filled[["Age", "Price"]])

# Step 6: Standardize the "Price" column using StandardScaler


scaler_standard = StandardScaler()
df_standardized_price = df.copy()
df_standardized_price["Price"] =
scaler_standard.fit_transform(df_mean_filled[["Price"]]) # Using
mean-filled data

# Step 7: Normalize the "Price" column using MinMaxScaler


scaler_minmax = MinMaxScaler()
df_normalized_price = df.copy()
df_normalized_price["Price"] =
scaler_minmax.fit_transform(df_mean_filled[["Price"]]) # Using
mean-filled data

# Display outputs
print("\nStandardized 'Price' column:\n",
df_standardized_price["Price"].head())
print("\nNormalized 'Price' column:\n",
df_normalized_price["Price"].head())
# Optionally: Save the processed datasets to CSV
df_dropna.to_csv("df_dropna.csv", index=False)
df_mean_filled.to_csv("df_mean_filled.csv", index=False)
df_median_filled.to_csv("df_median_filled.csv", index=False)
df_standardized_price.to_csv("df_standardized_price.csv", index=False)
df_normalized_price.to_csv("df_normalized_price.csv", index=False)

Output Terminal:-
Test Cases

Test Input Expected Output


Case

1 Dataset with missing Missing values count


values

2 Dataset after dropna() Data without missing


values

3 Standardization of Standardized values


"Price"

4 Normalization of Normalized values


"Price"

Inferences

●​ Mean and median imputation methods effectively handle missing values.


●​ Standardization ensures "Price" has a zero mean and unit variance.
●​ Normalization scales "Price" between 0 and 1.

Result​
Data preprocessing was successfully implemented using Python, ensuring the
dataset is clean and well-prepared for further analysis.

You might also like