0% found this document useful (0 votes)

231 views3 pages

Handson Data Preprocessing PYTHON

Uploaded by

Shahmir Yousaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

231 views3 pages

Handson Data Preprocessing PYTHON

Uploaded by

Shahmir Yousaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Page |1

Handson data preprocessing PYTHON

1. Loading and Inspecting the Data

• Load a dataset from a CSV file.

• Display the first and last 10 rows of the dataset.

• Identify the data types of each column.

• Count the number of rows and columns in the dataset.

• Check for null (missing) values and count them in each column.

2. Handling Missing Values

• Replace missing numerical values with the mean of the respective column.

• Replace missing numerical values with the median of the respective column.

• Replace missing numerical values with a constant value of your choice.

• Drop rows with missing values.

• Fill missing categorical values with the most frequent value in the column.

3. Data Cleaning

• Remove duplicate rows from the dataset.

• Drop unnecessary columns from the dataset.

• Rename columns to have consistent naming conventions.

• Standardize text data to lowercase or uppercase.

• Remove leading and trailing whitespaces from text columns.

4. Encoding Categorical Data

• Convert categorical variables into numeric form using:

o One-hot encoding.

Into to ML by S i r. A s i f Ahsa n
Page |2

o Label encoding.

o Mapping (e.g., Male = 0, Female = 1).

• Handle categorical columns with multiple categories (more than 10 unique values).

5. Feature Scaling

• Normalize numerical columns to a range of [0, 1].

• Standardize numerical columns to have a mean of 0 and a standard deviation of 1.

• Apply min-max scaling to numerical columns.

6. Outlier Detection and Handling

• Identify outliers in numerical columns using:

o Interquartile Range (IQR).

o Z-score.

• Remove rows with outliers.

• Cap or floor outliers to a maximum or minimum threshold.

7. Feature Engineering

• Create new features based on existing columns (e.g., age groups, salary ranges).

• Combine multiple columns into one (e.g., full name from first and last name).

• Extract information from columns (e.g., extracting year from a date column).

• Calculate summary statistics for groups (e.g., average salary by gender).

8. Data Transformation

• Log-transform skewed numerical columns.

• Apply square-root transformation to reduce the impact of large values.

• Normalize text data by removing special characters.

Into to ML by S i r. A s i f Ahsa n
Page |3

• Split a column into multiple columns (e.g., splitting a full name into first and last
names).

9. Working with Date/Time Data

• Convert a column to datetime format.

• Extract year, month, and day from a date column.

• Calculate the difference in days between two date columns.

• Group data by time periods (e.g., monthly or yearly).

10. Splitting and Exporting Data

• Split the dataset into training and testing sets.

• Save the cleaned dataset to a new CSV file.

• Save specific columns or subsets of the dataset to a file.

Additional Challenges

• Handle imbalanced datasets by oversampling or undersampling.

• Detect and correct inconsistent data (e.g., inconsistent spellings in text columns).

• Identify and remove columns with high correlation (redundant features).

• Visualize missing data and outliers in the dataset.

Instructions for Students

1. Complete each task on the provided dataset or any dataset of your choice.

2. Document the steps taken for each task.

3. Submit a cleaned dataset and a summary of the preprocessing steps performed.

Into to ML by S i r. A s i f Ahsa n

CS601 Short Notes (Handouts) For Final by Amir
No ratings yet
CS601 Short Notes (Handouts) For Final by Amir
76 pages
Guide Articles 29 To 40
No ratings yet
Guide Articles 29 To 40
3 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
No ratings yet
Predictive Analytics Modelling (21CSH-440) : Apex Institute of Technology
20 pages
GCUF BSSE 2nd
No ratings yet
GCUF BSSE 2nd
44 pages
Low Pass Filter
No ratings yet
Low Pass Filter
6 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Applied Mathematics Notes
No ratings yet
Applied Mathematics Notes
593 pages
Programming Fundamentals
No ratings yet
Programming Fundamentals
4 pages
CMOS Layout Tool Orientation
No ratings yet
CMOS Layout Tool Orientation
122 pages
Lab Task 8: Programming Exercises
100% (1)
Lab Task 8: Programming Exercises
3 pages
Lecture Notes in Discrete Mathematics Part 1
No ratings yet
Lecture Notes in Discrete Mathematics Part 1
17 pages
Swe-102 Lab 10!
100% (1)
Swe-102 Lab 10!
4 pages
Implementation of Denoising...
100% (1)
Implementation of Denoising...
105 pages
Non Linear Data Structure
No ratings yet
Non Linear Data Structure
59 pages
Past Papers 7th Sem CE19
No ratings yet
Past Papers 7th Sem CE19
13 pages
FAST-National University of Computer and Emerging Sciences: Data Structures
No ratings yet
FAST-National University of Computer and Emerging Sciences: Data Structures
32 pages
COAL Lab Manual 1 FAST NUCES
100% (1)
COAL Lab Manual 1 FAST NUCES
5 pages
DLD Mid Term Past Paper 2021
100% (1)
DLD Mid Term Past Paper 2021
2 pages
DHA Suffa University Karachi: Linear Algebra and Ordinary Differential Equations BS-2303
No ratings yet
DHA Suffa University Karachi: Linear Algebra and Ordinary Differential Equations BS-2303
3 pages
ADS 2nd Semester Course Outlines
No ratings yet
ADS 2nd Semester Course Outlines
5 pages
Mathematical Method Muzammil Tanveer PDF
No ratings yet
Mathematical Method Muzammil Tanveer PDF
117 pages
Idoc - Pub - Solution Manual of Calculus With Analytic Geometry by SM Yusuf PDF
0% (1)
Idoc - Pub - Solution Manual of Calculus With Analytic Geometry by SM Yusuf PDF
2 pages
Process Framework Activities
No ratings yet
Process Framework Activities
4 pages
Fet 222 A4 Worked Examples
No ratings yet
Fet 222 A4 Worked Examples
19 pages
SlideSet 14
No ratings yet
SlideSet 14
74 pages
Applications of First Law of Thermodynamics
No ratings yet
Applications of First Law of Thermodynamics
11 pages
General Problem Solver: Solving A Problem With GPS
No ratings yet
General Problem Solver: Solving A Problem With GPS
6 pages
Final Lab Manual DBMS 2023
No ratings yet
Final Lab Manual DBMS 2023
30 pages
Lecture-13 Introduction To Probability Lecture
No ratings yet
Lecture-13 Introduction To Probability Lecture
15 pages
CIS Multivariate Calculus CS
No ratings yet
CIS Multivariate Calculus CS
6 pages
Final Discrete Mathematics Past Papers Short Questions
No ratings yet
Final Discrete Mathematics Past Papers Short Questions
14 pages
Calculus Book
0% (1)
Calculus Book
4 pages
Number Theory Notes Anwar Khan
No ratings yet
Number Theory Notes Anwar Khan
219 pages
Continue
No ratings yet
Continue
5 pages
(D.E. Stevenson) Programming Language Fundamentals PDF
No ratings yet
(D.E. Stevenson) Programming Language Fundamentals PDF
218 pages
Write A Program That Takes 3 Values From User. Two Values of Integer and One Value of Float Data Type. Print Each Result On One Line
No ratings yet
Write A Program That Takes 3 Values From User. Two Values of Integer and One Value of Float Data Type. Print Each Result On One Line
15 pages
Coal Project Report
No ratings yet
Coal Project Report
15 pages
Internship Report
100% (1)
Internship Report
8 pages
PF Final Term Exam Fall 2023 Soultion
No ratings yet
PF Final Term Exam Fall 2023 Soultion
8 pages
Topical Past Papers: Computer Science 9608
No ratings yet
Topical Past Papers: Computer Science 9608
20 pages
BZU ADS Reference Books
100% (1)
BZU ADS Reference Books
1 page
Important Terms: Chapter 5 - Graph and Tree Data Structures Data Structures and Algorithm
No ratings yet
Important Terms: Chapter 5 - Graph and Tree Data Structures Data Structures and Algorithm
20 pages
File Handling in C++
No ratings yet
File Handling in C++
5 pages
CC112
100% (1)
CC112
1 page
Smart Fridge
100% (1)
Smart Fridge
17 pages
3-Sem BS Math-206 Programming Languages For Mathematicians
No ratings yet
3-Sem BS Math-206 Programming Languages For Mathematicians
3 pages
Modern Database Management 6 Edition: Jeffrey A. Hoffer, Mary B. Prescott, Fred R. Mcfadden
No ratings yet
Modern Database Management 6 Edition: Jeffrey A. Hoffer, Mary B. Prescott, Fred R. Mcfadden
29 pages
Python Question Bank Answers
No ratings yet
Python Question Bank Answers
6 pages
Zill - 2.2
No ratings yet
Zill - 2.2
19 pages
F650man I
No ratings yet
F650man I
553 pages
Practical Paranoia - OS X 10.11 - Marc Mintz PDF
No ratings yet
Practical Paranoia - OS X 10.11 - Marc Mintz PDF
372 pages
Final Question Paper (MATH-314)
No ratings yet
Final Question Paper (MATH-314)
3 pages
Control Flow Graphs Against Malware Methods of Analysis and Detection
No ratings yet
Control Flow Graphs Against Malware Methods of Analysis and Detection
5 pages
How To Download and Install Dev C++ 5.11
No ratings yet
How To Download and Install Dev C++ 5.11
25 pages
REHS0970 - Cross Reference For Electrical Connectors
No ratings yet
REHS0970 - Cross Reference For Electrical Connectors
115 pages
MATH-314 Linear Algebra Course Outlines
No ratings yet
MATH-314 Linear Algebra Course Outlines
4 pages
Futaba - Tbs - CRT As9106
No ratings yet
Futaba - Tbs - CRT As9106
2 pages
Your Charges in Detail - 7400447196: Monthly Rentals
No ratings yet
Your Charges in Detail - 7400447196: Monthly Rentals
5 pages
Dart Variables and Data Types
No ratings yet
Dart Variables and Data Types
3 pages
Chapter 2 - Let Us C Solutions
0% (1)
Chapter 2 - Let Us C Solutions
11 pages
ICS Breadth First Approach Introduction
No ratings yet
ICS Breadth First Approach Introduction
9 pages
Download ebooks file Handbook of Machine Learning for Computational Optimization: Applications and Case Studies (Demystifying Technologies for Computational Excellence) 1st Edition Vishal Jain (Editor) all chapters
100% (2)
Download ebooks file Handbook of Machine Learning for Computational Optimization: Applications and Case Studies (Demystifying Technologies for Computational Excellence) 1st Edition Vishal Jain (Editor) all chapters
49 pages
The Relational Model: © Pearson Education Limited 1995, 2005
No ratings yet
The Relational Model: © Pearson Education Limited 1995, 2005
23 pages
Intel SSD Firmware Update Tool Release Notes Rev037US
No ratings yet
Intel SSD Firmware Update Tool Release Notes Rev037US
8 pages
i-ALERT Remote Monitoring Solution
No ratings yet
i-ALERT Remote Monitoring Solution
12 pages
Arrays PDF
No ratings yet
Arrays PDF
13 pages
Android Based All-Purpose Agriculture Machine
No ratings yet
Android Based All-Purpose Agriculture Machine
17 pages
Client VPN OS Configuration - Cisco Meraki
No ratings yet
Client VPN OS Configuration - Cisco Meraki
33 pages
BSCS 3 Linear ALgebra Final 2023
No ratings yet
BSCS 3 Linear ALgebra Final 2023
1 page
Statistical Inference - MA252
No ratings yet
Statistical Inference - MA252
2 pages
Aes Unit Ii
No ratings yet
Aes Unit Ii
104 pages
Diabetes Expert System
0% (1)
Diabetes Expert System
20 pages
Assignment
No ratings yet
Assignment
4 pages
100 Public Reports On Bugcrowd
No ratings yet
100 Public Reports On Bugcrowd
3 pages
Eto Yung Performance Task Namin
No ratings yet
Eto Yung Performance Task Namin
2 pages
C Programming Language: Bitwise Structures
No ratings yet
C Programming Language: Bitwise Structures
11 pages
Course Outline-FS13-EE 801 Analysis of Stochastic Systems-MUI
No ratings yet
Course Outline-FS13-EE 801 Analysis of Stochastic Systems-MUI
3 pages
How To Log Defects
No ratings yet
How To Log Defects
6 pages
Final Manuscript The Arcadian 3
No ratings yet
Final Manuscript The Arcadian 3
31 pages
Student Book Touchstone 2
0% (1)
Student Book Touchstone 2
3 pages
3G Spectrum
No ratings yet
3G Spectrum
6 pages
Hudson - S Bay - EDI 850 Purchase Order PDF
No ratings yet
Hudson - S Bay - EDI 850 Purchase Order PDF
21 pages
Reflection
No ratings yet
Reflection
2 pages
Cache Memory
No ratings yet
Cache Memory
12 pages
E-Yantra Robotics Competition E-Yantra+ Caretaker Robot Theme
No ratings yet
E-Yantra Robotics Competition E-Yantra+ Caretaker Robot Theme
7 pages
N3cs19 Practice Set 17: Iple Pages. If Your Current Grade Pct. Is 82%, You May Complete Between 2 of The 3 Sections
No ratings yet
N3cs19 Practice Set 17: Iple Pages. If Your Current Grade Pct. Is 82%, You May Complete Between 2 of The 3 Sections
1 page
Elite-7x: Operation Manual
No ratings yet
Elite-7x: Operation Manual
0 pages

Handson Data Preprocessing PYTHON

Uploaded by

Handson Data Preprocessing PYTHON

Uploaded by

Page |1

Handson data preprocessing PYTHON

• Load a dataset from a CSV file.

• Display the first and last 10 rows of the dataset.

• Identify the data types of each column.

• Count the number of rows and columns in the dataset.

2. Handling Missing Values

• Replace missing numerical values with a constant value of your choice.

• Drop rows with missing values.

• Remove duplicate rows from the dataset.

• Drop unnecessary columns from the dataset.

• Rename columns to have consistent naming conventions.

• Standardize text data to lowercase or uppercase.

• Remove leading and trailing whitespaces from text columns.

4. Encoding Categorical Data

• Convert categorical variables into numeric form using:

o Mapping (e.g., Male = 0, Female = 1).

• Normalize numerical columns to a range of [0, 1].

• Standardize numerical columns to have a mean of 0 and a standard deviation of 1.

• Apply min-max scaling to numerical columns.

6. Outlier Detection and Handling

• Identify outliers in numerical columns using:

o Interquartile Range (IQR).

• Remove rows with outliers.

• Cap or floor outliers to a maximum or minimum threshold.

• Calculate summary statistics for groups (e.g., average salary by gender).

• Log-transform skewed numerical columns.

• Apply square-root transformation to reduce the impact of large values.

• Normalize text data by removing special characters.

9. Working with Date/Time Data

• Convert a column to datetime format.

• Extract year, month, and day from a date column.

• Calculate the difference in days between two date columns.

• Group data by time periods (e.g., monthly or yearly).

10. Splitting and Exporting Data

• Split the dataset into training and testing sets.

• Save the cleaned dataset to a new CSV file.

• Save specific columns or subsets of the dataset to a file.

• Handle imbalanced datasets by oversampling or undersampling.

• Identify and remove columns with high correlation (redundant features).

• Visualize missing data and outliers in the dataset.

Instructions for Students

2. Document the steps taken for each task.

3. Submit a cleaned dataset and a summary of the preprocessing steps performed.

You might also like