Lecture 2.3 Data Normalization
Lecture 2.3 Data Normalization
Features Normalization
• Feature normalization is a preprocessing step used to normalize the range
of the features.
• Motivation:
– Suppose that some ML algorithm computes the Euclidean distance between two
points. If one of the features has a broad range of values, the distance will be
governed by this particular feature. Therefore, the range of all features should be
normalized so that each feature contributes approximately proportionately to
the final distance.
0.11 0.52
− = 3.027
182 179
2
min-max Features scaling
𝑣𝑣𝑗𝑗 − min(𝑣𝑣𝑗𝑗 )
𝑣𝑣𝑣𝑗𝑗 =
max 𝑣𝑣𝑗𝑗 − min(𝑣𝑣𝑗𝑗 )
• 𝑣𝑣𝑗𝑗 is a column (corresponding to feature 𝑗𝑗) from the data matrix 𝑋𝑋.
• 𝑣𝑣′𝑗𝑗 are the normalized values of feature 𝑗𝑗. These values will be ∈ [0, 1]
3
min-max Features scaling
• Before Features Scaling • After Features Scaling
4
Features Standardization
𝑣𝑣𝑗𝑗 − mean(𝑣𝑣𝑗𝑗 )
𝑣𝑣𝑣𝑗𝑗 =
stdev 𝑣𝑣𝑗𝑗
• 𝑣𝑣𝑗𝑗 is a column (corresponding to feature 𝑗𝑗) from the data matrix 𝑋𝑋.
• 𝑣𝑣′𝑗𝑗 are the normalized values of feature 𝑗𝑗. These values will be ∈ [0, 1].
• To normalize, we just subtract the mean and divide by the standard deviation.
5
Features Standardization
• Before • After
6
• NOTE: do not rescale or standardize the output (target variable).
𝑣𝑣𝑗𝑗 − min(𝑣𝑣𝑗𝑗 )
𝑣𝑣𝑣𝑗𝑗 =
max 𝑣𝑣𝑗𝑗 − min(𝑣𝑣𝑗𝑗 )
𝑣𝑣𝑗𝑗 − mean(𝑣𝑣𝑗𝑗 )
𝑣𝑣𝑣𝑗𝑗 =
stdev 𝑣𝑣𝑗𝑗