0% found this document useful (0 votes)

85 views10 pages

LAB EXERCISE 2 - Data Preprocessing

Uploaded by

shreya halaswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views10 pages

LAB EXERCISE 2 - Data Preprocessing

Uploaded by

shreya halaswamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

LAB EXERCISE – 2

Data Preprocessing

Aim of the Experiment.

The main aim of this experiment is to preprocess the given dataset. The database is created
and is available in the file [Link].
Sample Dataset

id first last gender Marks selected

1 Leone Debrick Female 50 TRUE
2 Romola Phinness Female 60 FALSE
y
3 Geri Prium Male 65 FALSE
4 Sandy Doveston Female 95 FALSE
5 Jacenta Jansik Female 31 TRUE
6 Diane- Medhurst Female 45 TRUE
marie
7 Austen Pool Male 45 TRUE
8 Vanya Teffrey Male 70 FALSE
9 Giordano Elloy Male 36 FALSE
10 Rozele Fawcett Female 50 FALSE

The objectives of this experiment are

1. Explore Label Encoder

2. Explore Scikit Preprocessing routines like Scaling
3. Explore Scikit Preprocessing routines like Binarizer

Reference to the Textbook and Explanation

All the fundamentals are given in Chapter 2 and Appendix 2.

The variable in the dataset Female and Male can be changed to 0 or 1 using Label Encoder. It is done as
given below:

df_gender_encode=LabelEncoder()

[Link]=df_gender_encode.fit_transform([Link])

Scaling can be done as follows:

[Link] = [Link]([Link])

scaled_df= [Link]([Link])

Scaling removes the mean

Copyright @ Oxford University Press, India 2021

Binarization uses threshold and converts values to binary as shown below:

scaled_df_bin = [Link](threshold=0.5).transform(newarr)

Duplicates can be removed as follows:

df_duplicates_removed = [Link].drop_duplicates(df_duplicated)

The NaN of a column can be removed as shown below:

df['m5']=df['m5'].fillna(0)

This removes all the NaN to zero.

The command,

df=[Link](axis=1)

removes all the columns that has NaN.

Listing 1

import pandas as pd

col_list=["id","first","last","gender","Marks","selected"]

df = pd.read_csv("[Link]",usecols=col_list)

print(df)

print("End of Listing\n\n\n")

# Let us convert the in Gender column, make Female as 0 and

# male as 1 using LabelEncoder in scikitlearn method

from [Link] import LabelEncoder

df_gender_encode=LabelEncoder()

[Link]=df_gender_encode.fit_transform([Link])

# One can observe that female is coded as 0 and Male as 1

print(df)

print("End of Listing\n\n\n")

# Now one can scale the marks to remove mean

Copyright @ Oxford University Press, India 2021

from sklearn import preprocessing

[Link] = [Link]([Link])

scaled_df= [Link]([Link])

print(df)

print("Scaling of marks is completed\n\n\n\n")

newarr = scaled_df.reshape(-1,1)

scaled_df_bin = [Link](threshold=0.5).transform(newarr)

df['Marks']=scaled_df_bin

print(df)

print("Binarizarion of marks is completed\n\n\n\n")

Output

Copyright @ Oxford University Press, India 2021

import pandas as pd

col_list=["id","first","last","gender","Marks","selected"]

df = pd.read_csv("[Link]",usecols=col_list)

print(df)

print("End of Listing\n\n\n")

# Let us create duplicate elements in the given dataset

# This is done using the command concate 2 times as given below

df_duplicated = [Link]([df]*2, ignore_index=True)

print(df_duplicated)

print("Display before duplication\n\n\n\n")

df_duplicates_removed = [Link].drop_duplicates(df_duplicated)

print(df_duplicates_removed)

print("Display after duplication\n\n\n\n")

Output

Copyright @ Oxford University Press, India 2021

import pandas as pd

df = [Link]({

'm1':[50,'A',60,'A',80],

'm2':[60,'A','60','A',80],

'm3':[50,70,'A','A',60],

'm4':[60,'A','A','A',60],

'm5':['A','A','A',10,20]

})

df = [Link](pd.to_numeric,errors='coerce')

print(df)

print('Dataframe with NaN\n\n\n')

# Make all the NaN in Mark5 as zero

df['m5']=df['m5'].fillna(0)

print(df)

print('Making m5 NaN as 0 using fillna() function\n\n\n\n')

df1 = [Link]()

df1['m2'].fillna(df1['m2'].mean(),inplace=True)

print(df1)

print('Making m5 NaN as mean using fillna() function\n\n\n\n')

df2 = [Link]()

df1['m3'].fillna(df1['m2'].median(),inplace=True)

print(df2)

print('Making m5 NaN as median using fillna() function\n\n\n\n')

# Dropping all columns having NaN

df=[Link](axis=1)

print(df)

print('Dropping all columns having NaN\n\n\n\n')

Output

Listing 4

This listing illustrates the use of MinMax scaling and Standard scaling for finding Z-scores.

from numpy import asarray

from [Link] import MinMaxScaler

from [Link] import StandardScaler

data = asarray([[1,3],[8,5],[6,7],[8,9]])

print("\n Original Data")

print(data)

scaler1 = MinMaxScaler()

scaler2 = StandardScaler()

scaled1 = scaler1.fit_transform(data)

scaled2 = scaler2.fit_transform(data)

print("\n\nThe output of MinMax Scaling")

print(scaled1)

print("\n\nThe output of Standard scaling as z-score")

print(scaled2)

Output

LAB EXERCISE 2 - Data Preprocessing
No ratings yet
LAB EXERCISE 2 - Data Preprocessing
10 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Data Preparation Techniques in Python
No ratings yet
Data Preparation Techniques in Python
9 pages
Manisadav
No ratings yet
Manisadav
29 pages
Ap Python
No ratings yet
Ap Python
12 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Python Data Analysis with Numpy & Pandas
No ratings yet
Python Data Analysis with Numpy & Pandas
19 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
6 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
DA Exp6 HTML
No ratings yet
DA Exp6 HTML
9 pages
Week 10
No ratings yet
Week 10
50 pages
Data Prep for ML Beginners
No ratings yet
Data Prep for ML Beginners
39 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Working With Pre (Rocessing Data Files
No ratings yet
Working With Pre (Rocessing Data Files
4 pages
L-2 (Data Frame Part 1) .Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1) .Ipynb - Colab
5 pages
Résumé-Analyse Des Données Resumee Resumee
No ratings yet
Résumé-Analyse Des Données Resumee Resumee
4 pages
Lab Manual 5 Solved 40
No ratings yet
Lab Manual 5 Solved 40
13 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
Data Processing
No ratings yet
Data Processing
19 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Data Preprocessing for Machine Learning
No ratings yet
Data Preprocessing for Machine Learning
38 pages
Lab 3 & 4
No ratings yet
Lab 3 & 4
10 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Lab2
No ratings yet
Lab2
8 pages
LP II Practical
No ratings yet
LP II Practical
5 pages
Practical File 2024
No ratings yet
Practical File 2024
25 pages
Pandas: Data Cleaning Essentials
No ratings yet
Pandas: Data Cleaning Essentials
6 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
32 pages
Lab2!17!07-2025 - Demonstrate Various Data Pre-Processing Techniques For A Given Dataset.
No ratings yet
Lab2!17!07-2025 - Demonstrate Various Data Pre-Processing Techniques For A Given Dataset.
17 pages
Logistic Regression and Beginner ML Notes
No ratings yet
Logistic Regression and Beginner ML Notes
9 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
Study Material For Machine Learning - 1 - 1754721598318
No ratings yet
Study Material For Machine Learning - 1 - 1754721598318
18 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
IntroToPython Unit 5
No ratings yet
IntroToPython Unit 5
42 pages
Academic Performance Data Wrangling
No ratings yet
Academic Performance Data Wrangling
9 pages
Python in Research
No ratings yet
Python in Research
18 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Lecture 5 Encoding
No ratings yet
Lecture 5 Encoding
35 pages
Machine Learning Data Preprocessing Guide
No ratings yet
Machine Learning Data Preprocessing Guide
24 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Pandas Data Analysis and Wrangling Guide
No ratings yet
Pandas Data Analysis and Wrangling Guide
12 pages
Ip Study
No ratings yet
Ip Study
18 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
ML-Lab05-Data Preprocessing Techniques in Python
No ratings yet
ML-Lab05-Data Preprocessing Techniques in Python
7 pages
Types and Applications of Autoencoders
No ratings yet
Types and Applications of Autoencoders
56 pages
Project Manager Thesis
100% (2)
Project Manager Thesis
6 pages
Yocan Kodo Pro User Manual
No ratings yet
Yocan Kodo Pro User Manual
1 page
Mater Dei College College of Nursing Tubigon, Bohol
No ratings yet
Mater Dei College College of Nursing Tubigon, Bohol
27 pages
Introduction aux Systèmes d'Information
No ratings yet
Introduction aux Systèmes d'Information
2 pages
DL Services Acknowledgement Slip
No ratings yet
DL Services Acknowledgement Slip
1 page
MX Strada Series User Guide
No ratings yet
MX Strada Series User Guide
45 pages
ROS-based SLAM and Navigation For A Gazebo-Simulated Autonomous Quadrotor
No ratings yet
ROS-based SLAM and Navigation For A Gazebo-Simulated Autonomous Quadrotor
5 pages
A Balanced Scorecard Approach To Measure ERP Performance
No ratings yet
A Balanced Scorecard Approach To Measure ERP Performance
25 pages
Brushless and Permanent Magnet Free Wound Field Synchronous Motors For EV Traction
No ratings yet
Brushless and Permanent Magnet Free Wound Field Synchronous Motors For EV Traction
24 pages
SPE Pitch Deck
No ratings yet
SPE Pitch Deck
10 pages
Final Practical File
No ratings yet
Final Practical File
46 pages
ICSSR Data Service Overview
No ratings yet
ICSSR Data Service Overview
57 pages
OOP Laboratory Record Book
No ratings yet
OOP Laboratory Record Book
5 pages
Bilingual JavaScript Notes
No ratings yet
Bilingual JavaScript Notes
4 pages
Section 1 Atden0207ebook
No ratings yet
Section 1 Atden0207ebook
15 pages
Storage & Handling Procedure Guide
No ratings yet
Storage & Handling Procedure Guide
4 pages
Step-By-Step Explanation: Related Answered Questions
No ratings yet
Step-By-Step Explanation: Related Answered Questions
1 page
ITSEC Asia Internship Openings May 2025
No ratings yet
ITSEC Asia Internship Openings May 2025
7 pages
SunWiz 2017 Battery Report
No ratings yet
SunWiz 2017 Battery Report
93 pages
Qualtrics - Consumer Experience Trends Report - 2026 (24 PGS)
No ratings yet
Qualtrics - Consumer Experience Trends Report - 2026 (24 PGS)
24 pages
Cha-05 Adjustment Computation 01of01!15!05-2018
No ratings yet
Cha-05 Adjustment Computation 01of01!15!05-2018
85 pages
16 - Employee Joining Form
0% (1)
16 - Employee Joining Form
5 pages
Sliding Window
No ratings yet
Sliding Window
4 pages
Staff Selection Commission, Southern Region, Chennai
No ratings yet
Staff Selection Commission, Southern Region, Chennai
5 pages
APEX 22.2 Upgrade SQL Errors on 4K DB
No ratings yet
APEX 22.2 Upgrade SQL Errors on 4K DB
2 pages
Coding Standards for Embedded Systems
No ratings yet
Coding Standards for Embedded Systems
136 pages
GoldmanSachs - Client-Security-Statement
No ratings yet
GoldmanSachs - Client-Security-Statement
23 pages
Projects - POS - ERP Integration - OpenbravoWiki
No ratings yet
Projects - POS - ERP Integration - OpenbravoWiki
22 pages
Index
100% (1)
Index
157 pages

LAB EXERCISE 2 - Data Preprocessing

Uploaded by

LAB EXERCISE 2 - Data Preprocessing

Uploaded by

LAB EXERCISE – 2

Aim of the Experiment.

id first last gender Marks selected

The objectives of this experiment are

1. Explore Label Encoder

Reference to the Textbook and Explanation

All the fundamentals are given in Chapter 2 and Appendix 2.

Scaling can be done as follows:

Scaling removes the mean

Copyright @ Oxford University Press, India 2021

Duplicates can be removed as follows:

The NaN of a column can be removed as shown below:

This removes all the NaN to zero.

removes all the columns that has NaN.

# Let us convert the in Gender column, make Female as 0 and

# male as 1 using LabelEncoder in scikitlearn method

from [Link] import LabelEncoder

# One can observe that female is coded as 0 and Male as 1

# Now one can scale the marks to remove mean

Copyright @ Oxford University Press, India 2021

print("Scaling of marks is completed\n\n\n\n")

print("Binarizarion of marks is completed\n\n\n\n")

Copyright @ Oxford University Press, India 2021

# Let us create duplicate elements in the given dataset

# This is done using the command concate 2 times as given below

df_duplicated = [Link]([df]*2, ignore_index=True)

print("Display before duplication\n\n\n\n")

print("Display after duplication\n\n\n\n")

Copyright @ Oxford University Press, India 2021

print('Dataframe with NaN\n\n\n')

# Make all the NaN in Mark5 as zero

print('Making m5 NaN as 0 using fillna() function\n\n\n\n')

print('Making m5 NaN as mean using fillna() function\n\n\n\n')

print('Making m5 NaN as median using fillna() function\n\n\n\n')

Copyright @ Oxford University Press, India 2021

print('Dropping all columns having NaN\n\n\n\n')

Copyright @ Oxford University Press, India 2021

from numpy import asarray

from [Link] import MinMaxScaler

from [Link] import StandardScaler

print("\n Original Data")

Copyright @ Oxford University Press, India 2021

print("\n\nThe output of MinMax Scaling")

print("\n\nThe output of Standard scaling as z-score")

Copyright @ Oxford University Press, India 2021

You might also like