Data Preprocessing

Data preprocessing is the process of cleaning, organizing, and transforming raw data into a format that is suitable for analysis and model training.

Uploaded by

techlerner123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views8 pages

Data Preprocessing

Data preprocessing is the process of cleaning, organizing, and transforming raw data into a format that is suitable for analysis and model training.

Uploaded by

techlerner123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

DATA PRE PROCESSING

WHAT IS DATA PREPROCESSING?

Data preprocessing, a component of data preparation,

describes any type of processing performed on raw data
to prepare it for another data processing procedure. It has
traditionally been an important preliminary step for the
data mining process. More recently, data preprocessing
techniques have been adapted for training machine
learning models and AI models and for running

2
WHY IS DATA PREPROCESSING
IMPORTANT?

• Virtually any type of data analysis, data science or AI

development requires some type of data preprocessing to
provide reliable, precise and robust results for enterprise
applications.
• Real-world data is messy and is often created, processed
and stored by a variety of humans, business processes and
applications
• As a result, a data set may be missing individual fields,
contain manual input errors, or have duplicate data or
different names to describe the same thing.
• Humans can often identify and rectify these problems in the
data they use in the line of business,
3
WHAT ARE THE KEY STEPS IN DATA
PREPROCESSING?

• Data profiling- Data profiling is the process of examining,

analyzing and reviewing data to collect statistics about its
quality. Data scientists identify data sets, form a hypothesis
of features that might be relevant
• Data cleansing-The aim here is to find the easiest way to
rectify quality issues, such as eliminating bad data, filling in
missing data or otherwise ensuring the raw data is suitable.
• Data reduction-Raw data sets often include redundant
data that arise from characterizing phenomena in different
ways or data that is not relevant to a particular ML.

4
WHAT ARE THE KEY STEPS IN DATA
PREPROCESSING?
• Data transformation- Here, data scientists think about
how different aspects of the data need to be organized to
make the most sense for the goal. This could include
things like structuring unstructured data, combining
salient variables.
• Data enrichment-In this step, data scientists apply the
various feature engineering libraries to the data to effect
the desired transformations. The result should be a data
set organized.
• Data validation- At this stage, the data is split into two
sets. The second set is the testing data that is used to
gauge the accuracy and robustness

5
DATA PREPROCESSING TECHNIQUES

Data cleansing
• Identify and sort out missing data. There are a variety
of reasons a data set might be missing individual fields
of data. In an IoT application that records temperature,
adding in a missing average temperature between the
previous and subsequent record might be a safe fix.
• Reduce noisy data. Real-world data is often noisy,
which can distort an analytic or AI model.

6
DATA PRE PROCESSING
TECHNIQUES

Feature engineering
Often, multiple variables change over different scales, or
one will change linearly while another will change
exponentialScaling helps to transform the data in a way
that makes it easier for algorithms to tease apart a
meaningful relationship between variables.
• Feature encoding. Another aspect of feature
engineering involves organizing unstructured data into
a structured format. Unstructured data formats can
include text, audio and video

7
THANK YOU

Dip Profiles Documentation
83% (6)
Dip Profiles Documentation
89 pages
Commissioning Generator AVR, PSS and Model Validation: Wenyan Gu, Member, IEEE
100% (1)
Commissioning Generator AVR, PSS and Model Validation: Wenyan Gu, Member, IEEE
5 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
56 pages
Jaipur Knowledge City
No ratings yet
Jaipur Knowledge City
31 pages
20 Coding Patterns To Master MAANG Interviews
No ratings yet
20 Coding Patterns To Master MAANG Interviews
22 pages
Unit 01 DWDM
No ratings yet
Unit 01 DWDM
105 pages
Unit 2: Big Data Analytics
No ratings yet
Unit 2: Big Data Analytics
45 pages
Ab Initio
No ratings yet
Ab Initio
17 pages
CSC 3301-Lecture06 Introduction To Machine Learning
No ratings yet
CSC 3301-Lecture06 Introduction To Machine Learning
56 pages
ISW2001NBF - AEB (VERSIONE 6.1.3) - Installatore - ENG PDF
No ratings yet
ISW2001NBF - AEB (VERSIONE 6.1.3) - Installatore - ENG PDF
33 pages
633777800398832500ata Minig Presentation
No ratings yet
633777800398832500ata Minig Presentation
20 pages
Arba Minch University Arba Minch Institute of Technology Faculty of Computing & Software Engineering
No ratings yet
Arba Minch University Arba Minch Institute of Technology Faculty of Computing & Software Engineering
20 pages
How To Use Office 365 Salesforce and Box With Splunk Enterprise and Splunk Enterprise Security
No ratings yet
How To Use Office 365 Salesforce and Box With Splunk Enterprise and Splunk Enterprise Security
42 pages
ASCII Characters Set
No ratings yet
ASCII Characters Set
8 pages
Syllabus CS212 Data Structure
No ratings yet
Syllabus CS212 Data Structure
5 pages
Data Preprocessing: G.A.Putri Saptawati
No ratings yet
Data Preprocessing: G.A.Putri Saptawati
9 pages
DataPreprocessing 2
No ratings yet
DataPreprocessing 2
68 pages
Exploratory Data Analysis and Data Preprocessing - Dr. Haleema
No ratings yet
Exploratory Data Analysis and Data Preprocessing - Dr. Haleema
11 pages
DMDW Chapter 3
No ratings yet
DMDW Chapter 3
13 pages
Data Preprocessing Part 1
No ratings yet
Data Preprocessing Part 1
14 pages
03preprocessing Part1
No ratings yet
03preprocessing Part1
21 pages
Bi 20soeit11002 Antala Krishnaa
No ratings yet
Bi 20soeit11002 Antala Krishnaa
5 pages
Prepositions of Place - My Room
100% (1)
Prepositions of Place - My Room
1 page
Indexing Structures For Files: Database Design Database Design
No ratings yet
Indexing Structures For Files: Database Design Database Design
9 pages
COMPAPPABCA50150rDatrAP Data Preprocessing2 (DataMining)
No ratings yet
COMPAPPABCA50150rDatrAP Data Preprocessing2 (DataMining)
13 pages
4.1 - Data Preprocessing
No ratings yet
4.1 - Data Preprocessing
28 pages
How To Configure DHCP in Cisco Router Using Packet Tracer and Gns3 - Router Switch Configuration Using Packet Tracer GNS3
100% (1)
How To Configure DHCP in Cisco Router Using Packet Tracer and Gns3 - Router Switch Configuration Using Packet Tracer GNS3
5 pages
Data Preprocessing in Python Pandas (With Code)
No ratings yet
Data Preprocessing in Python Pandas (With Code)
11 pages
Electronic Devices and Circuits: Faculty: Mr. M Srinivas Reddy
No ratings yet
Electronic Devices and Circuits: Faculty: Mr. M Srinivas Reddy
32 pages
From Prototypical To Prototyping: Mass - Customization Versus 20TH Century Utopias in Architecture and Urban Design
No ratings yet
From Prototypical To Prototyping: Mass - Customization Versus 20TH Century Utopias in Architecture and Urban Design
10 pages
Data Binning
No ratings yet
Data Binning
9 pages
5-Channel Integrated Power Solution With Quad Buck Regulators and 200 Ma LDO Regulator
No ratings yet
5-Channel Integrated Power Solution With Quad Buck Regulators and 200 Ma LDO Regulator
40 pages
Theory of HTML
No ratings yet
Theory of HTML
15 pages
DS Module2 L3 L13
No ratings yet
DS Module2 L3 L13
43 pages
1.3 Introduction To Data Preprocessing
No ratings yet
1.3 Introduction To Data Preprocessing
16 pages
Design and Fabrication of Rotary Fixture For Control Valve Cylinder Head of Tractor
No ratings yet
Design and Fabrication of Rotary Fixture For Control Valve Cylinder Head of Tractor
5 pages
1802 Release Highlights: SAP Activate Implementation Roadmap For SAP S/4HANA Cloud
No ratings yet
1802 Release Highlights: SAP Activate Implementation Roadmap For SAP S/4HANA Cloud
1 page
Rom vs. Ram
No ratings yet
Rom vs. Ram
8 pages
The Australian Journal of Agricultural Economics
No ratings yet
The Australian Journal of Agricultural Economics
13 pages
Chapter 2 Introduction Data Mining
No ratings yet
Chapter 2 Introduction Data Mining
2 pages
DWM
No ratings yet
DWM
14 pages
Data Mining Basics
No ratings yet
Data Mining Basics
52 pages
MrCooper Interview Experience
No ratings yet
MrCooper Interview Experience
3 pages
Data Preprocessing - Cleaning and Normalization
No ratings yet
Data Preprocessing - Cleaning and Normalization
11 pages
Data Mining Basics
No ratings yet
Data Mining Basics
38 pages
Make A Rainbow: Strand Topic Primary SOL 5.3
No ratings yet
Make A Rainbow: Strand Topic Primary SOL 5.3
6 pages
Dw&bi PR2,3
No ratings yet
Dw&bi PR2,3
6 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
Kelompok 7 - Dokumentasi Proyek
No ratings yet
Kelompok 7 - Dokumentasi Proyek
18 pages
FINAL JOINING KIT COMPLETE - Employees 2
No ratings yet
FINAL JOINING KIT COMPLETE - Employees 2
17 pages
Animasi Pesawat Menggunakan OpenGL
No ratings yet
Animasi Pesawat Menggunakan OpenGL
11 pages
03 Preprocessing
No ratings yet
03 Preprocessing
18 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
211101088math - Data Ass 2
No ratings yet
211101088math - Data Ass 2
12 pages
Data Preprocessing
No ratings yet
Data Preprocessing
4 pages
Session 2 - Data Pre-Processing
No ratings yet
Session 2 - Data Pre-Processing
19 pages
A CNN-LSTM Model For Gold Price Time Series Forecasting NCA
No ratings yet
A CNN-LSTM Model For Gold Price Time Series Forecasting NCA
12 pages
A Comprehensive Approach Towards Data Preprocessing Techniques & Association Rules
No ratings yet
A Comprehensive Approach Towards Data Preprocessing Techniques & Association Rules
9 pages
SML Updated UNIT-2
No ratings yet
SML Updated UNIT-2
43 pages
DWDM LS3 Fall 24 25
No ratings yet
DWDM LS3 Fall 24 25
50 pages
UNIT - Introduction - DataScience - New
No ratings yet
UNIT - Introduction - DataScience - New
55 pages
What Is Data Preprocessing
No ratings yet
What Is Data Preprocessing
4 pages
Unit II Notes
No ratings yet
Unit II Notes
36 pages
CMR BDA Data Pre Processing
No ratings yet
CMR BDA Data Pre Processing
10 pages
2022 - A Review - Data Pre-Processing and Data Augmentation Techniques - ScienceDirect
No ratings yet
2022 - A Review - Data Pre-Processing and Data Augmentation Techniques - ScienceDirect
20 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
11 pages
DS-Unit-2 ABM Final
No ratings yet
DS-Unit-2 ABM Final
134 pages
Lecture 2 The Data Science Process and Tools For Each Step
No ratings yet
Lecture 2 The Data Science Process and Tools For Each Step
8 pages
Session-2-CO3-Introduction To Data Preprocessing
No ratings yet
Session-2-CO3-Introduction To Data Preprocessing
39 pages
Data Preprocessing
No ratings yet
Data Preprocessing
2 pages
Hints of Assignment5 - Fall 2024
No ratings yet
Hints of Assignment5 - Fall 2024
11 pages
Assignment # 1: Course: Instructor
No ratings yet
Assignment # 1: Course: Instructor
3 pages
HP DL380 G8: Hardware Module Description
No ratings yet
HP DL380 G8: Hardware Module Description
6 pages
Lecture Notes 1.3 & 1.4
No ratings yet
Lecture Notes 1.3 & 1.4
2 pages
16-Data Preprocessing
No ratings yet
16-Data Preprocessing
27 pages
Lecture 3 Unit 1
No ratings yet
Lecture 3 Unit 1
61 pages
Unit - III DW
No ratings yet
Unit - III DW
14 pages
Data Mining
No ratings yet
Data Mining
22 pages
Data Processing
No ratings yet
Data Processing
14 pages
03 Design Apis
No ratings yet
03 Design Apis
16 pages
Data Preprocessing
No ratings yet
Data Preprocessing
8 pages
Ch8 Data and Its Processing
No ratings yet
Ch8 Data and Its Processing
32 pages
Data Preprocessing, Data Warehousing
No ratings yet
Data Preprocessing, Data Warehousing
9 pages
Unit-I Da
No ratings yet
Unit-I Da
42 pages
What Is Duplicate Data?
No ratings yet
What Is Duplicate Data?
10 pages
Bi Unit 4
No ratings yet
Bi Unit 4
19 pages
Data Handling and Visualization 3rd Unit
No ratings yet
Data Handling and Visualization 3rd Unit
4 pages
2 - DM
No ratings yet
2 - DM
2 pages
Introduction To Data Science-Compressed
No ratings yet
Introduction To Data Science-Compressed
29 pages

Data Preprocessing

Uploaded by

Data Preprocessing

Uploaded by

DATA PRE PROCESSING

WHAT IS DATA PREPROCESSING?

Data preprocessing, a component of data preparation,

• Virtually any type of data analysis, data science or AI

• Data profiling- Data profiling is the process of examining,

You might also like