Data Preprocessing
Data Preprocessing
2
WHY IS DATA PREPROCESSING
IMPORTANT?
4
WHAT ARE THE KEY STEPS IN DATA
PREPROCESSING?
• Data transformation- Here, data scientists think about
how different aspects of the data need to be organized to
make the most sense for the goal. This could include
things like structuring unstructured data, combining
salient variables.
• Data enrichment-In this step, data scientists apply the
various feature engineering libraries to the data to effect
the desired transformations. The result should be a data
set organized.
• Data validation- At this stage, the data is split into two
sets. The second set is the testing data that is used to
gauge the accuracy and robustness
5
DATA PREPROCESSING TECHNIQUES
Data cleansing
• Identify and sort out missing data. There are a variety
of reasons a data set might be missing individual fields
of data. In an IoT application that records temperature,
adding in a missing average temperature between the
previous and subsequent record might be a safe fix.
• Reduce noisy data. Real-world data is often noisy,
which can distort an analytic or AI model.
6
DATA PRE PROCESSING
TECHNIQUES
Feature engineering
Often, multiple variables change over different scales, or
one will change linearly while another will change
exponentialScaling helps to transform the data in a way
that makes it easier for algorithms to tease apart a
meaningful relationship between variables.
• Feature encoding. Another aspect of feature
engineering involves organizing unstructured data into
a structured format. Unstructured data formats can
include text, audio and video
7
THANK YOU