Unit 2 Data Preprocessing
Unit 2 Data Preprocessing
Data Preprocessing • It is a process of preparing the raw data and making it suitable for a
machine learning model.
• It is the first and crucial step while creating a machine learning
model.
• A real-world data generally contains noises, missing values, and
maybe in an unusable format which cannot be directly used for
machine learning models.
• Data preprocessing is required tasks for cleaning the data and
making it suitable for a machine learning model which also increases
the accuracy and efficiency of a machine learning model.
Data Transformation
• Data transformation in data mining refers to the process of • Data cleaning: Removing or correcting errors, inconsistencies, and missing
values in the data.
converting raw data into a format that is suitable for analysis and
• Data integration: Combining data from multiple sources, such as databases
modeling. and spreadsheets, into a single format.
• The goal of data transformation is to prepare the data for data mining • Data normalization: Scaling the data to a common range of values, such as
so that it can be used to extract useful insights and knowledge. between 0 and 1, to facilitate comparison and analysis.
• Data reduction: Reducing the dimensionality of the data by selecting a
subset of relevant features or attributes.
• Data discretization: Converting continuous data into discrete categories or
bins.
• Data aggregation: Combining data at different levels of granularity, such as
by summing or averaging, to create new features or attributes.