Feature Scaling Techniques: Machine Learning

Feature Scaling Techniques
Machine Learning
Taken from Wikipedia,
Feature scaling is a method used to normalize the
range of independent variables or features of data.
In data processing, it is also known as data
normalization and is generally performed during the
data pre-processing step.
• Since the range of values of raw data varies widely, in
some machine learning algorithms, objective functions will
not work properly without normalization. For example,
many classifiers calculate the distance between two points by
the Euclidean distance. If one of the features has a broad range
of values, the distance will be governed by this particular
feature. Therefore, the range of all features should be
normalized so that each feature contributes approximately
proportionately to the final distance.
• Another reason why feature scaling is applied is that gradient
descent converges much faster with feature scaling than
without it.
So the question arises… which algorithms need
feature scaling?
• Linear based models, ones which calculate

distances as a part of their algorithm need data
to be scaled.
• For example, Linear and Logistic Regression,
SVMs, PCA, LDA, KNN, K-Means clustering
And which algorithms do NOT need feature
scaling?
• Tree based models, which essentially ask

“inequality” based questions to make splits at
each node, do NOT need data to be scaled.
• For example, Decision Trees, Random Forests,
Gradient Boosting Trees etc.
Different Feature Scaling Techniques:
• StandardScaler
• MinMaxScaler
• RobustScaler
• MaxAbsScaler
• PowerTransformer
• QuantileTransformer
• Normalizer
StandardScaler
• Useful when the feature follows a normal-like
distribution, not so much otherwise.
• Scales the features to have zero mean and
standard deviation of one, to give it a feel and
properties of “standard” normal distribution
• It does NOT perform well on features that have
outliers.
• Does not change shape of distribution of the
feature.
StandardScaler
• For
each feature X, we calculate the mean (Xm)
and standard deviation (Xs)
• For each value in that feature X (Xi), calculate:
• New Xi = original
5
scaled
-1.01857
10 -0.6471
• Xm = 18.71 12
14
-0.49851
-0.34993
• Xs = 13.46 18
23
-0.05275
0.318722
49 2.250371
StandardScaler
MinMaxScaler
• Scales the range of features between 0 and 1.
• Could work well on data that is NOT normally
distributed (bell shaped)
• It does NOT perform well on features that have
outliers.
• Does not change shape of distribution of the feature.
MinMaxScaler
• For
each feature X, we calculate the minimum value,
(Xmin) and max value (Xmax)
• New Xi = original scaled
5 0
• Xmin = 5 10 0.113636
12 0.159091
• Xmax = 49 14
18
0.204545
0.295455
23 0.409091
49 1
MinMaxScaler
RobustScaler
• Useful when feature has marginal outliers.
• Subtracts the median, and not mean.
• Does not take into account the min and max values,
instead uses Inter Quantile Range (IQR).
• Hence is generally “robust” to outliers.
• But it will not completely remove outliers.
• Could be used when neither StandardScaler nor
MinMaxScaler is appropriate, due to presence of
outliers.
• Does little to change shape of distribution of a feature.
RobustScaler
• For
each feature X, we calculate the median (Xmd)
and two quantiles: (X0.25 and X0.75)
• (X0.75 – X0.25) is called Inter Quantile Range (IQR)
• Xmd = 14 5
10
-0.94737
-0.42105
• X0.75 = 20.5 12
14
-0.21053
0
• X0.25 = 11 18
23
0.421053
0.947368
49 3.684211
RobustScaler
MaxAbsScaler
• Not
so useful when feature has outliers.
• For each feature X, we calculate the max value
(Xmax) (in absolute terms)
-5 -0.10204
10 0.204082
12 0.244898
-14 -0.28571
18 0.367347
23 0.469388
-49 -1
MaxAbsScaler
PowerTransformer
• When desired output is more “Gaussian” like
• Currently has ‘Box-Cox’ and ‘Yeo-Johnson’
transforms
• Box-Cox requires the input data to be strictly positive
(not even zero is acceptable).
• For features which have zeroes or negative values,
Yeo-Johnson comes to the rescue.
PowerTransformer
QuantileTransformer
• Useful when feature has outliers.
• This method transforms the features to follow a
uniform or a normal distribution. Therefore, for a
given feature, this transformation tends to spread
out the most frequent values. It also reduces the
impact of (marginal) outliers: this is therefore a
robust preprocessing scheme.
• Makes data more Gaussian-like.
QuantileTransformer
Normalizer
• Computes row-wise calculations, instead of the
column-wise we’ve been seeing all along
• Useful for clustering and text-classification tasks
• Can use l1 (Manhattan), l2 (Euclidean) distances as
“norm” parameter.
• Also an option for norm=‘max’, to scale values by
simply dividing element in each row by the max value
in that entire row.
Normalizer
max l1 l2
680 800.69 688.5
495 587.69 501.58
Dividing factor for

first two rows
norm=‘max’ norm=‘l1’ norm=‘l2’

Exponential and log transformer
• Just like any other feature scaling technique really –
applying a mathematical formula to the columns and
scaling them accordingly.
• Utilizes sklearn’s FunctionTransformer class to do the
mathematical calculations
• Here, we make use of the simple mathematical
transformations like taking log, squaring, taking
square roots, cube roots etc.
• Could be useful on a case by case basis, we just need
to check different things.
Exponential and log transformer
original scaled
5 2.236
10 3.162
12 3.464
14 3.741
18 4.242
23 4.795
49 7
Square root
transformer
References
• https://scikit-learn.org/stable/auto_examples/
preprocessing/plot_all_scaling.html
• https://sebastianraschka.com/Articles/2014_a
bout_feature_scaling.html
• https://www.quora.com/Which-machine-algo
rithms-require-data-scaling-normalization
• https://machinelearningmastery.com/standar
dscaler-and-minmaxscaler-transforms-in-pyth
on
/
• https://

Feature Scaling Techniques: Machine Learning

Uploaded by

Copyright:

Available Formats

Feature Scaling Techniques: Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feature Scaling Techniques: Machine Learning

Uploaded by

Copyright:

Available Formats

Feature Scaling Techniques

• Linear based models, ones which calculate

• Tree based models, which essentially ask

Dividing factor for

norm=‘max’ norm=‘l1’ norm=‘l2’

You might also like