0% found this document useful (0 votes)
19 views12 pages

Data Transformation

Uploaded by

Cherry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

Data Transformation

Uploaded by

Cherry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

DATA

TRANSFORMATION
✔ The majority of data is raw, which makes it challenging to
work with as it is difficult to understand this data directly.
✔ Therefore, it is essential to convert this data into a format
that is more usable and understandable.
✔ Data Transformation is a technique used to transform raw
data into a more appropriate format that enables efficient
data mining and model building.
Data transformation

The data transformation Techniques


Smoothing: It is a process that is used to remove noise from the
dataset using some algorithms It allows for highlighting important
features present in the dataset. It helps in predicting the patterns.
Aggregation: Data collection or aggregation is the method of
storing and presenting data in a summary format. The data may be
obtained from multiple data sources to integrate these data sources
into a data analysis description The collection of data is useful for
everything from decisions concerning financing or business
strategy of the product, pricing, operations, and marketing
strategies. For example, Sales, data may be aggregated to compute
monthly& annual total amounts.
Aggregation
Day Item Quantity 25
1 A 6 5
2 A 15 20,000
3 B 4 100
4 A 3 ----
5 C 12
6 D 8
7 D 9 Group by
8 c 6
Split,Apply,Combine

1 A 6
2 A 15 SUM,
MEAN,SD
4 A 3
Data transformation

Generalization: It converts low-level data attributes to


high-level data attributes using concept hierarchy. For Example
Age initially in Numerical form (22, 25) is converted into
categorical value (young, old). For example, Categorical
attributes, such as house addresses, may be generalized to
higher-level definitions, such as town or country.

Normalization: Data normalization involves converting all


data variables into a given range. Such as-1.0 to 1.0 or 0 to 1

✔ min-max normalization
✔ z-score normalization
✔ normalization by decimal scaling
Min-max normalization is one of the common methods to normalize
data. It performs a linear transformation on the original data set.
Suppose that min and max are the minimum and maximum values of
a data set respectively, min-max normalization maps every value vi of
the set to vi‘ in the range [new_min, new_max] by computing

Let’s take an example,


Suppose we have to normalize the following data
set, 200, 300, 400, 600, 1000 to a new range [0, 1], then using min-max
normalization
min = 200, max = 1000, new_min = 0, new_max = 1
The normalized data set is: 0, 0.125, 0.25, 0.5, 1
z-score normalization

New value = (x – μ) / σ

where:

x: Original value
μ: Mean of data
σ: Standard deviation of data
The following example shows how to perform
z-score normalization.

The mean of the dataset is 21.2 and the


standard deviation is 29.8.
Decimal Scaling Formula
A value v of attribute A is can be normalized by the following
formula

Normalized value of attribute = ( vi / 10j )


CGPA
Normalized
CGPA Formula
after Decimal
scaling
2 2/10 0.2
3 3/10 0.3

CGPA
Normalized
Salary bonus Formula
after Decimal
scaling
400 400 / 1000 0.4
310 310 / 1000 0.31
Attribute Construction: Where new attributes are created &
applied to assist the mining process from the given set of
attributes. This simplifies the original data & makes the mining
more efficient.

You might also like