0% found this document useful (0 votes)
3 views

Data Preprocessing

Data preprocessing is the process of cleaning, organizing, and transforming raw data into a format that is suitable for analysis and model training.

Uploaded by

techlerner123
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data Preprocessing

Data preprocessing is the process of cleaning, organizing, and transforming raw data into a format that is suitable for analysis and model training.

Uploaded by

techlerner123
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

DATA PRE PROCESSING

WHAT IS DATA PREPROCESSING?

Data preprocessing, a component of data preparation,


describes any type of processing performed on raw data
to prepare it for another data processing procedure. It has
traditionally been an important preliminary step for the
data mining process. More recently, data preprocessing
techniques have been adapted for training machine
learning models and AI models and for running

2
WHY IS DATA PREPROCESSING
IMPORTANT?

• Virtually any type of data analysis, data science or AI


development requires some type of data preprocessing to
provide reliable, precise and robust results for enterprise
applications.
• Real-world data is messy and is often created, processed
and stored by a variety of humans, business processes and
applications
• As a result, a data set may be missing individual fields,
contain manual input errors, or have duplicate data or
different names to describe the same thing.
• Humans can often identify and rectify these problems in the
data they use in the line of business,
3
WHAT ARE THE KEY STEPS IN DATA
PREPROCESSING?

• Data profiling- Data profiling is the process of examining,


analyzing and reviewing data to collect statistics about its
quality. Data scientists identify data sets, form a hypothesis
of features that might be relevant
• Data cleansing-The aim here is to find the easiest way to
rectify quality issues, such as eliminating bad data, filling in
missing data or otherwise ensuring the raw data is suitable.
• Data reduction-Raw data sets often include redundant
data that arise from characterizing phenomena in different
ways or data that is not relevant to a particular ML.

4
WHAT ARE THE KEY STEPS IN DATA
PREPROCESSING?
• Data transformation- Here, data scientists think about
how different aspects of the data need to be organized to
make the most sense for the goal. This could include
things like structuring unstructured data, combining
salient variables.
• Data enrichment-In this step, data scientists apply the
various feature engineering libraries to the data to effect
the desired transformations. The result should be a data
set organized.
• Data validation- At this stage, the data is split into two
sets. The second set is the testing data that is used to
gauge the accuracy and robustness

5
DATA PREPROCESSING TECHNIQUES

Data cleansing
• Identify and sort out missing data. There are a variety
of reasons a data set might be missing individual fields
of data. In an IoT application that records temperature,
adding in a missing average temperature between the
previous and subsequent record might be a safe fix.
• Reduce noisy data. Real-world data is often noisy,
which can distort an analytic or AI model.

6
DATA PRE PROCESSING
TECHNIQUES

Feature engineering
Often, multiple variables change over different scales, or
one will change linearly while another will change
exponentialScaling helps to transform the data in a way
that makes it easier for algorithms to tease apart a
meaningful relationship between variables.
• Feature encoding. Another aspect of feature
engineering involves organizing unstructured data into
a structured format. Unstructured data formats can
include text, audio and video

7
THANK YOU

You might also like