Feature Scaling Techniques: Machine Learning

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Feature Scaling Techniques

Machine Learning
Taken from Wikipedia,
Feature scaling is a method used to normalize the
range of independent variables or features of data.
In data processing, it is also known as data
normalization and is generally performed during the
data pre-processing step.
• Since the range of values of raw data varies widely, in
some machine learning algorithms, objective functions will
not work properly without normalization. For example,
many classifiers calculate the distance between two points by
the Euclidean distance. If one of the features has a broad range
of values, the distance will be governed by this particular
feature. Therefore, the range of all features should be
normalized so that each feature contributes approximately
proportionately to the final distance.
• Another reason why feature scaling is applied is that gradient
descent converges much faster with feature scaling than
without it.
So the question arises… which algorithms need
feature scaling?

• Linear based models, ones which calculate


distances as a part of their algorithm need data
to be scaled.
• For example, Linear and Logistic Regression,
SVMs, PCA, LDA, KNN, K-Means clustering
And which algorithms do NOT need feature
scaling?

• Tree based models, which essentially ask


“inequality” based questions to make splits at
each node, do NOT need data to be scaled.
• For example, Decision Trees, Random Forests,
Gradient Boosting Trees etc.
Different Feature Scaling Techniques:
• StandardScaler
• MinMaxScaler
• RobustScaler
• MaxAbsScaler
• PowerTransformer
• QuantileTransformer
• Normalizer
StandardScaler
• Useful when the feature follows a normal-like
distribution, not so much otherwise.
• Scales the features to have zero mean and
standard deviation of one, to give it a feel and
properties of “standard” normal distribution
• It does NOT perform well on features that have
outliers.
• Does not change shape of distribution of the
feature.
StandardScaler
• For
  each feature X, we calculate the mean (Xm)
and standard deviation (Xs)
• For each value in that feature X (Xi), calculate:
• New Xi = original
5
scaled
-1.01857
10 -0.6471

• Xm = 18.71 12
14
-0.49851
-0.34993

• Xs = 13.46 18
23
-0.05275
0.318722
49 2.250371
StandardScaler
MinMaxScaler
• Scales the range of features between 0 and 1.
• Could work well on data that is NOT normally
distributed (bell shaped)
• It does NOT perform well on features that have
outliers.
• Does not change shape of distribution of the feature.
MinMaxScaler
• For
  each feature X, we calculate the minimum value,
(Xmin) and max value (Xmax)
• For each value in that feature X (Xi), calculate:
• New Xi = original scaled
5 0
• Xmin = 5 10 0.113636
12 0.159091

• Xmax = 49 14
18
0.204545
0.295455
23 0.409091
49 1
MinMaxScaler
RobustScaler
• Useful when feature has marginal outliers.
• Subtracts the median, and not mean.
• Does not take into account the min and max values,
instead uses Inter Quantile Range (IQR).
• Hence is generally “robust” to outliers.
• But it will not completely remove outliers.
• Could be used when neither StandardScaler nor
MinMaxScaler is appropriate, due to presence of
outliers.
• Does little to change shape of distribution of a feature.
RobustScaler
• For
  each feature X, we calculate the median (Xmd)
and two quantiles: (X0.25 and X0.75)
• For each value in that feature X (Xi), calculate:
• (X0.75 – X0.25) is called Inter Quantile Range (IQR)
• New Xi = original scaled

• Xmd = 14 5
10
-0.94737
-0.42105

• X0.75 = 20.5 12
14
-0.21053
0

• X0.25 = 11 18
23
0.421053
0.947368
49 3.684211
RobustScaler
MaxAbsScaler
• Not
  so useful when feature has outliers.
• For each feature X, we calculate the max value
(Xmax) (in absolute terms)
• For each value in that feature X (Xi), calculate:
• New Xi = original scaled

-5 -0.10204

10 0.204082

12 0.244898

-14 -0.28571

18 0.367347

23 0.469388

-49 -1
MaxAbsScaler
PowerTransformer
• When desired output is more “Gaussian” like
• Currently has ‘Box-Cox’ and ‘Yeo-Johnson’
transforms
• Box-Cox requires the input data to be strictly positive
(not even zero is acceptable).
• For features which have zeroes or negative values,
Yeo-Johnson comes to the rescue.
PowerTransformer
QuantileTransformer
• Useful when feature has outliers.
• This method transforms the features to follow a
uniform or a normal distribution. Therefore, for a
given feature, this transformation tends to spread
out the most frequent values. It also reduces the
impact of (marginal) outliers: this is therefore a
robust preprocessing scheme.
• Makes data more Gaussian-like.
QuantileTransformer
Normalizer
• Computes row-wise calculations, instead of the
column-wise we’ve been seeing all along
• Useful for clustering and text-classification tasks
• Can use l1 (Manhattan), l2 (Euclidean) distances as
“norm” parameter.
• Also an option for norm=‘max’, to scale values by
simply dividing element in each row by the max value
in that entire row.
Normalizer
max l1 l2
680 800.69 688.5
495 587.69 501.58

Dividing factor for


first two rows

norm=‘max’ norm=‘l1’ norm=‘l2’


Exponential and log transformer
• Just like any other feature scaling technique really –
applying a mathematical formula to the columns and
scaling them accordingly.
• Utilizes sklearn’s FunctionTransformer class to do the
mathematical calculations
• Here, we make use of the simple mathematical
transformations like taking log, squaring, taking
square roots, cube roots etc.
• Could be useful on a case by case basis, we just need
to check different things.
Exponential and log transformer

original scaled

5 2.236

10 3.162

12 3.464

14 3.741

18 4.242

23 4.795

49 7

Square root
transformer
References
• https://scikit-learn.org/stable/auto_examples/
preprocessing/plot_all_scaling.html
• https://sebastianraschka.com/Articles/2014_a
bout_feature_scaling.html
• https://www.quora.com/Which-machine-algo
rithms-require-data-scaling-normalization
• https://machinelearningmastery.com/standar
dscaler-and-minmaxscaler-transforms-in-pyth
on
/
• https://

You might also like