0% found this document useful (0 votes)

21 views4 pages

22IZ023 Nikhil - Exercise 5 - Data Preprocessing

Uploaded by

nikhildeutsch03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

22IZ023 Nikhil - Exercise 5 - Data Preprocessing

Uploaded by

nikhildeutsch03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Exercise-5

Aim
To perform data preprocessing techniques such as handling missing values,
standardization, and normalization using Python.

Logic Description
Data preprocessing involves cleaning and transforming raw data to improve its
quality for analysis. This includes handling missing values, standardizing data
distributions, and normalizing data scales.

Algorithm

1. Load the dataset using Pandas.

2. Identify and handle missing values using mean and median imputation.
3. Remove rows with missing values.
4. Standardize the "Price" column using StandardScaler.
5. Normalize the "Price" column using MinMaxScaler.
6. Save the processed datasets to CSV files.

Package/Tools Description

SI.N Name of Description

O Package/Tool

1 Pandas Data manipulation and analysis

2 Scikit-learn Machine learning library for

preprocessing

Source Code:-
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Step 1: Load the dataset

df = pd.read_csv('D:/123.csv')
# Step 2: Identify missing values in each column
print("Missing Values:\n", df.isnull().sum())

# Step 3: Create df_dropna by removing rows with missing values

df_dropna = df.dropna()
print("\nDataset after dropping missing values:\n", df_dropna.head())

# Step 4: Fill missing values in numeric columns using mean

imputer_mean = SimpleImputer(strategy="mean")
df_mean_filled = df.copy()
df_mean_filled[["Age", "Price"]] =
imputer_mean.fit_transform(df_mean_filled[["Age", "Price"]])

# Step 5: Fill missing values in numeric columns using median

imputer_median = SimpleImputer(strategy="median")
df_median_filled = df.copy()
df_median_filled[["Age", "Price"]] =
imputer_median.fit_transform(df_median_filled[["Age", "Price"]])

# Step 6: Standardize the "Price" column using StandardScaler

scaler_standard = StandardScaler()
df_standardized_price = df.copy()
df_standardized_price["Price"] =
scaler_standard.fit_transform(df_mean_filled[["Price"]]) # Using
mean-filled data

# Step 7: Normalize the "Price" column using MinMaxScaler

scaler_minmax = MinMaxScaler()
df_normalized_price = df.copy()
df_normalized_price["Price"] =
scaler_minmax.fit_transform(df_mean_filled[["Price"]]) # Using
mean-filled data

# Display outputs
print("\nStandardized 'Price' column:\n",
df_standardized_price["Price"].head())
print("\nNormalized 'Price' column:\n",
df_normalized_price["Price"].head())
# Optionally: Save the processed datasets to CSV
df_dropna.to_csv("df_dropna.csv", index=False)
df_mean_filled.to_csv("df_mean_filled.csv", index=False)
df_median_filled.to_csv("df_median_filled.csv", index=False)
df_standardized_price.to_csv("df_standardized_price.csv", index=False)
df_normalized_price.to_csv("df_normalized_price.csv", index=False)

Output Terminal:-
Test Cases

Test Input Expected Output

Case

1 Dataset with missing Missing values count

values

2 Dataset after dropna() Data without missing

values

3 Standardization of Standardized values

"Price"

4 Normalization of Normalized values

"Price"

Inferences

● Mean and median imputation methods effectively handle missing values.

● Standardization ensures "Price" has a zero mean and unit variance.
● Normalization scales "Price" between 0 and 1.

Result
Data preprocessing was successfully implemented using Python, ensuring the
dataset is clean and well-prepared for further analysis.

Name: Haseeb Arif Reg No: SP18-BSE-087 Date of Submission: May 10, 2020. Submitted To: Ms. Saira Beg
100% (2)
Name: Haseeb Arif Reg No: SP18-BSE-087 Date of Submission: May 10, 2020. Submitted To: Ms. Saira Beg
7 pages
LabVIEW Graphical Programming (4th Ed) (Gary and Richard)
100% (2)
LabVIEW Graphical Programming (4th Ed) (Gary and Richard)
625 pages
Sample of Globe Proof of Billing
No ratings yet
Sample of Globe Proof of Billing
2 pages
PS Nvision Handbook
No ratings yet
PS Nvision Handbook
80 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
Week 6 - Data Cleaning
No ratings yet
Week 6 - Data Cleaning
8 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
Avinash DA 6
No ratings yet
Avinash DA 6
3 pages
DMML Lab Report 03
No ratings yet
DMML Lab Report 03
9 pages
Practicals
No ratings yet
Practicals
42 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
10) Merging Dataframes: # Detecting Duplicates
No ratings yet
10) Merging Dataframes: # Detecting Duplicates
7 pages
Dsbda Lab - 1 - 1736243987425
No ratings yet
Dsbda Lab - 1 - 1736243987425
10 pages
Data Preprocessing 1
No ratings yet
Data Preprocessing 1
6 pages
Step-by-Step Explanation of Python Data Preprocessing Script
No ratings yet
Step-by-Step Explanation of Python Data Preprocessing Script
9 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
Data Science Tutorial
No ratings yet
Data Science Tutorial
40 pages
Lab File
No ratings yet
Lab File
96 pages
Slides On DataII
No ratings yet
Slides On DataII
26 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
Lab 3 DWM
No ratings yet
Lab 3 DWM
5 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
DA Cheat Codes
No ratings yet
DA Cheat Codes
2 pages
5-Demonstrate Missing Value Analysis Using Sample Data.-06!01!2025
No ratings yet
5-Demonstrate Missing Value Analysis Using Sample Data.-06!01!2025
2 pages
Data Science Practicals - Ipynb
No ratings yet
Data Science Practicals - Ipynb
54 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
Data Preprocessing PT 2
No ratings yet
Data Preprocessing PT 2
7 pages
Data Preprocessing Tutorial
No ratings yet
Data Preprocessing Tutorial
39 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
47 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
14 pages
DAP Writeups - Merged
No ratings yet
DAP Writeups - Merged
33 pages
Lab2
No ratings yet
Lab2
8 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
45 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Exp 2
No ratings yet
Exp 2
6 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
Exp 2 Data Preprocessing - Cleaning The Dataset Obtained From The UCI ML Repository
No ratings yet
Exp 2 Data Preprocessing - Cleaning The Dataset Obtained From The UCI ML Repository
9 pages
Advance Python
No ratings yet
Advance Python
5 pages
Exp-2 ML
No ratings yet
Exp-2 ML
6 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Lab 6
No ratings yet
Lab 6
9 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Lecture 4 New Data Pre Processing
No ratings yet
Lecture 4 New Data Pre Processing
41 pages
Exp 01-B Feature Selection and Extraction
No ratings yet
Exp 01-B Feature Selection and Extraction
12 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
DA Lab
No ratings yet
DA Lab
27 pages
Analysis and Prediction of House Prices by Linear Regression Model
No ratings yet
Analysis and Prediction of House Prices by Linear Regression Model
91 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Russia-Ukraine War and Global Economy
No ratings yet
Russia-Ukraine War and Global Economy
34 pages
LA Og
No ratings yet
LA Og
10 pages
22IZ014 HARSHAN R - Exercise 3 - Implement Alpha Beta Pruning With MinMax Algorithm For Maze Game
No ratings yet
22IZ014 HARSHAN R - Exercise 3 - Implement Alpha Beta Pruning With MinMax Algorithm For Maze Game
4 pages
Oops
No ratings yet
Oops
10 pages
22IZ023 Nikhil - Exercise 2 - A - and AO - Algorithm
No ratings yet
22IZ023 Nikhil - Exercise 2 - A - and AO - Algorithm
4 pages
22IZ014 HARSHAN R - Exercise 2 - A - and AO - Algorithm
No ratings yet
22IZ014 HARSHAN R - Exercise 2 - A - and AO - Algorithm
4 pages
DS (Ex 1)
No ratings yet
DS (Ex 1)
6 pages
DS (Ex 3)
No ratings yet
DS (Ex 3)
7 pages
22IZ023 Nikhil - Exercise 7 A - Decision Trees
No ratings yet
22IZ023 Nikhil - Exercise 7 A - Decision Trees
4 pages
Unit IV-storage Virtualization
No ratings yet
Unit IV-storage Virtualization
26 pages
Wine Quality Prediction Project Report
No ratings yet
Wine Quality Prediction Project Report
4 pages
Config Idevice Standard DOCU V1d0 en
No ratings yet
Config Idevice Standard DOCU V1d0 en
44 pages
Schneider Sebastian
No ratings yet
Schneider Sebastian
42 pages
DsPIC33 EP64 GS502 Datasheet
No ratings yet
DsPIC33 EP64 GS502 Datasheet
390 pages
1 ML Introduction
No ratings yet
1 ML Introduction
36 pages
IOT Sewage Monitoring System
No ratings yet
IOT Sewage Monitoring System
76 pages
Introduction To Programming Language C 2023
100% (1)
Introduction To Programming Language C 2023
44 pages
Samsung Mobile Secret Codes
100% (1)
Samsung Mobile Secret Codes
42 pages
6- CH 6 Program Control Instructions - ١٢٢٠٠٢
No ratings yet
6- CH 6 Program Control Instructions - ١٢٢٠٠٢
23 pages
Technical Skills
No ratings yet
Technical Skills
5 pages
My Resulth
No ratings yet
My Resulth
3 pages
Stochastic Regular Expressions
No ratings yet
Stochastic Regular Expressions
16 pages
Unit 2: Chapter 3: Requirements Analysis and Specification 1. Requirements Gathering and Analysis
No ratings yet
Unit 2: Chapter 3: Requirements Analysis and Specification 1. Requirements Gathering and Analysis
21 pages
BCA Course Outcomes
No ratings yet
BCA Course Outcomes
5 pages
Comparison of Crisp and Fuzzy Sets
No ratings yet
Comparison of Crisp and Fuzzy Sets
10 pages
(Day - 1 - 7) - Prep For Mock Conference - Info Kit (Netmission)
No ratings yet
(Day - 1 - 7) - Prep For Mock Conference - Info Kit (Netmission)
34 pages
Interrupts in MSP430
No ratings yet
Interrupts in MSP430
39 pages
The Complete Guide To Prompt Engineering....
No ratings yet
The Complete Guide To Prompt Engineering....
47 pages
Skyblue - Operations: Operating Manual
No ratings yet
Skyblue - Operations: Operating Manual
47 pages
Mis13 Ch13 Case1 Ibm-Bpm
No ratings yet
Mis13 Ch13 Case1 Ibm-Bpm
3 pages
Mysql (Create, Insert, Select, Update, Delete)
No ratings yet
Mysql (Create, Insert, Select, Update, Delete)
7 pages
Library API
No ratings yet
Library API
7 pages
FO - POA - .00166-002 - Additive Manufacturing Checklist
No ratings yet
FO - POA - .00166-002 - Additive Manufacturing Checklist
2 pages
Airbnb GRP 6
No ratings yet
Airbnb GRP 6
26 pages
DAA - Paper - CT Exam - 2022-2023 - K.kaushik
No ratings yet
DAA - Paper - CT Exam - 2022-2023 - K.kaushik
2 pages

22IZ023 Nikhil - Exercise 5 - Data Preprocessing

Uploaded by

22IZ023 Nikhil - Exercise 5 - Data Preprocessing

Uploaded by

Exercise-5

1.​ Load the dataset using Pandas.

SI.N Name of Description

1 Pandas Data manipulation and analysis

2 Scikit-learn Machine learning library for

# Step 1: Load the dataset

# Step 3: Create df_dropna by removing rows with missing values

# Step 4: Fill missing values in numeric columns using mean

# Step 5: Fill missing values in numeric columns using median

# Step 6: Standardize the "Price" column using StandardScaler

# Step 7: Normalize the "Price" column using MinMaxScaler

Test Input Expected Output

1 Dataset with missing Missing values count

2 Dataset after dropna() Data without missing

3 Standardization of Standardized values

4 Normalization of Normalized values

●​ Mean and median imputation methods effectively handle missing values.

You might also like

1. Load the dataset using Pandas.

● Mean and median imputation methods effectively handle missing values.