Data Transformation
Data Transformation
TRANSFORMATION
✔ The majority of data is raw, which makes it challenging to
work with as it is difficult to understand this data directly.
✔ Therefore, it is essential to convert this data into a format
that is more usable and understandable.
✔ Data Transformation is a technique used to transform raw
data into a more appropriate format that enables efficient
data mining and model building.
Data transformation
1 A 6
2 A 15 SUM,
MEAN,SD
4 A 3
Data transformation
✔ min-max normalization
✔ z-score normalization
✔ normalization by decimal scaling
Min-max normalization is one of the common methods to normalize
data. It performs a linear transformation on the original data set.
Suppose that min and max are the minimum and maximum values of
a data set respectively, min-max normalization maps every value vi of
the set to vi‘ in the range [new_min, new_max] by computing
New value = (x – μ) / σ
where:
x: Original value
μ: Mean of data
σ: Standard deviation of data
The following example shows how to perform
z-score normalization.
CGPA
Normalized
Salary bonus Formula
after Decimal
scaling
400 400 / 1000 0.4
310 310 / 1000 0.31
Attribute Construction: Where new attributes are created &
applied to assist the mining process from the given set of
attributes. This simplifies the original data & makes the mining
more efficient.