0% found this document useful (0 votes)
4 views8 pages

Lecture 10 - Data Transformation-M

The document discusses data preprocessing techniques essential for data mining, including data cleaning, integration, reduction, transformation, and discretization. It elaborates on various data transformation strategies such as normalization, smoothing, and aggregation, emphasizing the importance of normalization for clustering and distance measures. Additionally, it provides methods for normalization, including min-max, z-score, and decimal scaling, along with practice questions for further understanding.

Uploaded by

gihel53025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

Lecture 10 - Data Transformation-M

The document discusses data preprocessing techniques essential for data mining, including data cleaning, integration, reduction, transformation, and discretization. It elaborates on various data transformation strategies such as normalization, smoothing, and aggregation, emphasizing the importance of normalization for clustering and distance measures. Additionally, it provides methods for normalization, including min-max, z-score, and decimal scaling, along with practice questions for further understanding.

Uploaded by

gihel53025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

CS06504

Data Mining
Lecture # 9
Data Preprocessing
Transformation and
Discretization
(Ch # 3)
Data Preprocessing
 Why preprocess the data?
 Data cleaning
 Data integration
 Data reduction
 Data Transformation and
Discretization
 Summary 2
Data Transformation
 The data are transformed or consolidated into
forms appropriate for mining processing.
Different strategies includes
 Smoothing – binning, regression, and clustering
 Attribute construction – new attributes are constructed
from the given set of attributes.
 Aggregation – Summary or aggregation operations are
applied e.g. construction of data cube
 Normalization – data are scaled so as to fall within a
smaller rage e.g. -1.0 to 1.0 or 0.0 to 1.0
 Discretization - Values of numeric attribute are
replaced by interval labels or conceptual labels.
(concept hierarchy for numeric attribute)
 Concept hierarchy generation for nominal data –
nominal attribute values are generalized to higher-level
3
concepts e.g. street is generalized to block, city or
Data Normalization
 A database can contain n numbers of continuous
type attributes.
 Where a larger range continuous type attribute or
noise can shift the objects distance.
 (Remember: All continuous type attributes similarity is
checked using a single Euclidean distance formula).
 For example: ‘Income’ attribute can dominate the
distance as compared to ‘Weight’ and ‘Age’
attributes.
 The objective of normalization is convert all integer
type attributes, so that there values fall within a
small specified range, such as 0 to 1.0.
 Normalization is particularly useful for clustering
and distance measure algorithms such as k-
nearest-neighbor.
Data Normalization
 min-max normalization

v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA

Example:- Suppose that the minimum and


maximum values for the attribute income
are 12,000 and 98,000. By min-max
normalization, a value of 73,600 for
income is transformed to
» ((73,000-12,000)/(98,000-12,000)) (1.0-0) + 0 = 0.716
Data Normalization
 z-score normalization
This method of normalization is useful
when the actual minimum and maximum
of any attribute are unknown.
Or when outliers which dominate the min-
max normalization.
v  meanA
v' 
stand_ devA
 Decimal scaling normalization
v
v'  j v'
Where j is the smallest integer such that Max(|
10 |)<1
Building mineable data sets
Data Transformation: Normalization
 min-max normalization
v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
 z-score normalization
v  meanA
v' 
stand _ devA
 normalization by decimal scaling
v
v'  j Where j is the smallest integer such that Max(| v ' |)<1
10
Price in € 4 6 14 16 18 19 21 22 23 24 27 34
Min-max [0,1] 0 .06 .33 .4 .46 .5 .56 .6 .63 .66 .76 1
- -
Z-score -1.8 -1.6 -0.1 0 0.2 0.4 0.5 0.6 1 1.8
0.6 0.3
decimal .04 .06 .14 .16 .18 .19 .21 .22 .23 .24 .27 .34
Practice Questions
 Solve Exercise Questions 3.3, 3.6,
3.7, 3.8
 Find PCA for the following Data using
Correlation matrix

x 2.5 0.5 2.2 1.9 3.1 2.3 2.0 1.0 1.5 1.1
y 2.4 0.7 2.9 2.2 3.0 2.7 1.6 1.1 1.6 0.9

You might also like