0% found this document useful (0 votes)
9 views8 pages

Data Mining - Introduction

Data Mining - Introduction

Uploaded by

Atif Saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views8 pages

Data Mining - Introduction

Data Mining - Introduction

Uploaded by

Atif Saeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Preprocessing in Data Science

Muhammad Zulqarnain

MS Data Science
Fast NUCES islamabad

BS Electrical Computer Engineering


Comsats ISlamabad

Presented by Muhammad Zulqarnain 1


Data preprocessing
Data preprocessing is a crucial step in the data
science pipeline as it directly impacts the quality
and effectiveness of the models we build.

• It refers to techniques and procedures used to


prepare raw /noisy data into clean organized
structured form suitable for analysis.

Presented by D Muhammad Zulqarnain 2


Techniques
Data Cleaning
Data Transformation
Data Integration
Data Reduction/Dimensionality reduction

Presented by D Muhammad Zulqarnain 3


Data cleaning
Handling Missing Values
• statistical measures mean ,median, mode
• advance methodes
Removing Duplicate:
• identifying and eliminating duplicate

Presented by D Muhammad Zulqarnain 4


Data Transformation

• Normalization/Standardization: Scaling
numerical features to a standard range
– Z-Score Scaling (Standardization μ=0,σ=1)
– Min-Max Scaling (Normalization 0 1)

• Encoding Categorical Variables: Converting


categorical data into numerical form (e.g., one-
hot encoding, label encoding).

Presented by D Muhammad Zulqarnain 5


Data Integration
Combining Data Sources:

• combining data from multiple sources files


multiples databases
• merging into a single format.

Presented by D Muhammad Zulqarnain 6


Data Reduction
Feature Selection
• correlated feature
Dimensionality Reduction:
• PCA

Presented by D Muhammad Zulqarnain 7


Thank You!

Presented by D Muhammad Zulqarnain 8

You might also like