Feature Scaling Techniques: Machine Learning
Feature Scaling Techniques: Machine Learning
Feature Scaling Techniques: Machine Learning
Machine Learning
Taken from Wikipedia,
Feature scaling is a method used to normalize the
range of independent variables or features of data.
In data processing, it is also known as data
normalization and is generally performed during the
data pre-processing step.
• Since the range of values of raw data varies widely, in
some machine learning algorithms, objective functions will
not work properly without normalization. For example,
many classifiers calculate the distance between two points by
the Euclidean distance. If one of the features has a broad range
of values, the distance will be governed by this particular
feature. Therefore, the range of all features should be
normalized so that each feature contributes approximately
proportionately to the final distance.
• Another reason why feature scaling is applied is that gradient
descent converges much faster with feature scaling than
without it.
So the question arises… which algorithms need
feature scaling?
• Xm = 18.71 12
14
-0.49851
-0.34993
• Xs = 13.46 18
23
-0.05275
0.318722
49 2.250371
StandardScaler
MinMaxScaler
• Scales the range of features between 0 and 1.
• Could work well on data that is NOT normally
distributed (bell shaped)
• It does NOT perform well on features that have
outliers.
• Does not change shape of distribution of the feature.
MinMaxScaler
• For
each feature X, we calculate the minimum value,
(Xmin) and max value (Xmax)
• For each value in that feature X (Xi), calculate:
• New Xi = original scaled
5 0
• Xmin = 5 10 0.113636
12 0.159091
• Xmax = 49 14
18
0.204545
0.295455
23 0.409091
49 1
MinMaxScaler
RobustScaler
• Useful when feature has marginal outliers.
• Subtracts the median, and not mean.
• Does not take into account the min and max values,
instead uses Inter Quantile Range (IQR).
• Hence is generally “robust” to outliers.
• But it will not completely remove outliers.
• Could be used when neither StandardScaler nor
MinMaxScaler is appropriate, due to presence of
outliers.
• Does little to change shape of distribution of a feature.
RobustScaler
• For
each feature X, we calculate the median (Xmd)
and two quantiles: (X0.25 and X0.75)
• For each value in that feature X (Xi), calculate:
• (X0.75 – X0.25) is called Inter Quantile Range (IQR)
• New Xi = original scaled
• Xmd = 14 5
10
-0.94737
-0.42105
• X0.75 = 20.5 12
14
-0.21053
0
• X0.25 = 11 18
23
0.421053
0.947368
49 3.684211
RobustScaler
MaxAbsScaler
• Not
so useful when feature has outliers.
• For each feature X, we calculate the max value
(Xmax) (in absolute terms)
• For each value in that feature X (Xi), calculate:
• New Xi = original scaled
-5 -0.10204
10 0.204082
12 0.244898
-14 -0.28571
18 0.367347
23 0.469388
-49 -1
MaxAbsScaler
PowerTransformer
• When desired output is more “Gaussian” like
• Currently has ‘Box-Cox’ and ‘Yeo-Johnson’
transforms
• Box-Cox requires the input data to be strictly positive
(not even zero is acceptable).
• For features which have zeroes or negative values,
Yeo-Johnson comes to the rescue.
PowerTransformer
QuantileTransformer
• Useful when feature has outliers.
• This method transforms the features to follow a
uniform or a normal distribution. Therefore, for a
given feature, this transformation tends to spread
out the most frequent values. It also reduces the
impact of (marginal) outliers: this is therefore a
robust preprocessing scheme.
• Makes data more Gaussian-like.
QuantileTransformer
Normalizer
• Computes row-wise calculations, instead of the
column-wise we’ve been seeing all along
• Useful for clustering and text-classification tasks
• Can use l1 (Manhattan), l2 (Euclidean) distances as
“norm” parameter.
• Also an option for norm=‘max’, to scale values by
simply dividing element in each row by the max value
in that entire row.
Normalizer
max l1 l2
680 800.69 688.5
495 587.69 501.58
original scaled
5 2.236
10 3.162
12 3.464
14 3.741
18 4.242
23 4.795
49 7
Square root
transformer
References
• https://scikit-learn.org/stable/auto_examples/
preprocessing/plot_all_scaling.html
• https://sebastianraschka.com/Articles/2014_a
bout_feature_scaling.html
• https://www.quora.com/Which-machine-algo
rithms-require-data-scaling-normalization
• https://machinelearningmastery.com/standar
dscaler-and-minmaxscaler-transforms-in-pyth
on
/
• https://