0% found this document useful (0 votes)

67 views30 pages

Scaling Techniques

Uploaded by

dynamogamer911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views30 pages

Scaling Techniques

Uploaded by

dynamogamer911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Scaling Techniques

Tushar B. Kute,
http://tusharkute.com
Why to apply feature scaling ?

• Real world dataset contains features that highly vary in

magnitudes, units, and range.
• Normalisation should be performed when the scale of a
feature is irrelevant or misleading and not should
Normalise when the the scale is meaningful.
• The algorithms which use Euclidean Distance measure are
sensitive to Magnitudes. Here feature scaling helps to
weigh all the features equally.
• Formally, If a feature in the dataset is big in scale compared
to others then in algorithms where Euclidean distance is
measured this big scaled feature becomes dominating and
needs to be normalized.
Why to apply scaling ?

• For example, assume your input dataset contains one

column with values ranging from 0 to 1, and another
column with values ranging from 10,000 to 100,000.
• The great difference in the scale of the numbers
could cause problems when you attempt to combine
the values as features during modeling.
• Normalization avoids these problems by creating
new values that maintain the general distribution
and ratios in the source data, while keeping values
within a scale applied across all numeric columns
used in the model.
Gradient Descent Based Algorithms

• Machine learning algorithms like linear

regression, logistic regression, neural network,
etc. that use gradient descent as an
optimization technique require data to be
scaled.
• Take a look at the formula for gradient descent
below:
Gradient Descent Based Algorithms

• The presence of feature value X in the formula will

affect the step size of the gradient descent. The
difference in ranges of features will cause different
step sizes for each feature.
• To ensure that the gradient descent moves smoothly
towards the minima and that the steps for gradient
descent are updated at the same rate for all the
features, we scale the data before feeding it to the
model.
• Having features on a similar scale can help the gradient
descent converge more quickly towards the minima.
Distance Based Algorithms

• Distance algorithms like KNN, K-means, and SVM

are most affected by the range of features.
• This is because behind the scenes they are using
distances between data points to determine their
similarity.
• For example, let’s say we have data containing high
school CGPA scores of students (ranging from 0 to
5) and their future incomes (in thousands Rupees):
Distance Based Algorithms

• Since both the features have different scales, there

is a chance that higher weightage is given to
features with higher magnitude.
• This will impact the performance of the machine
learning algorithm and obviously, we do not want
our algorithm to be biased towards one feature.
Distance Based Algorithms

• Therefore, we scale our data before employing a

distance based algorithm so that all the features
contribute equally to the result.
Distance Based Algorithms

• The effect of scaling is conspicuous when we

compare the Euclidean distance between data
points for students A and B, and between B and C,
before and after scaling as shown below:
Tree Based Algorithms

• Tree-based algorithms, on the other hand, are fairly

insensitive to the scale of the features.
• Think about it, a decision tree is only splitting a
node based on a single feature.
• The decision tree splits a node on a feature that
increases the homogeneity of the node. This split
on a feature is not influenced by other features.
• So, there is virtually no effect of the remaining
features on the split. This is what makes them
invariant to the scale of the features!
Feature Scaling Techniques

• Min-Max Scaler
– Normalization
• Standard Scaler
– Standardization
• Robust Scaling
– Robust Scaler
Normalization

• Normalization is a scaling technique in which values

are shifted and rescaled so that they end up
ranging between 0 and 1. It is also known as Min-
Max scaling.
• Here’s the formula for normalization:
Normalization

• Here, Xmax and Xmin are the maximum and the minimum
values of the feature respectively.
– When the value of X is the minimum value in the column,
the numerator will be 0, and hence X’ is 0
– On the other hand, when the value of X is the maximum
value in the column, the numerator is equal to the
denominator and thus the value of X’ is 1
– If the value of X is between the minimum and the
maximum value, then the value of X’ is between 0 and 1
Example:

• Data = 1000,2000,3000,9000
Using min-max normalization by setting min:0 and
max:1
• Solution:
here,new_max(A)=1 , as given in question- max=1
new_min(A)=0,as given in question- min=0
max(A)=9000,as the maximum data among
1000,2000,3000,9000 is 9000
min(A)=1000,as the minimum data among
1000,2000,3000,9000 is 1000
Example:

• Case-1: normalizing 1000 –

v = 1000 , putting all values in the formula,we get

v' = (1000-1000) X (1-0)
----------------- + 0 = 0
9000-1000
Example:

• Case-2: normalizing 2000 –

v = 2000, putting all values in the formula,we get

v '= (2000-1000) X (1-0)
----------------- + 0 = 0 .125
9000-1000
Example:

• Case-3: normalizing 3000 –

v=3000, putting all values in the formula,we get

v'=(3000-1000) X (1-0)
----------------- + 0 = 0 .25
9000-1000
Example:

• Case-4: normalizing 9000 –

v=9000, putting all values in the formula, we get

v'=(9000-1000) X (1-0)
----------------- +0= 1
9000-1000
• Outcome :
Hence, the normalized values of
1000,2000,3000,9000 are 0, 0.125, .25, 1.
When to apply?

• Normalization is a good technique to use when you

do not know the distribution of your data or when
you know the distribution is not Gaussian (a bell
curve).
• Normalization is useful when your data has varying
scales and the algorithm you are using does not
make assumptions about the distribution of your
data, such as k-nearest neighbors and artificial
neural networks.
Standardization

• Standardization is another scaling technique where

the values are centered around the mean with a
unit standard deviation.
• This means that the mean of the attribute becomes
zero and the resultant distribution has a unit
standard deviation.
• Here’s the formula for standardization:
Standardization
Comparison
Z-score

• Simply put, a z-score (also called a standard score) gives

you an idea of how far from the mean a data point is.
• But more technically it’s a measure of how many standard
deviations below or above the population mean a raw
score is.
• A z-score can be placed on a normal distribution curve. Z-
scores range from -3 standard deviations (which would fall
to the far left of the normal distribution curve) up to +3
standard deviations (which would fall to the far right of the
normal distribution curve).
• In order to use a z-score, you need to know the mean μ and
also the population standard deviation σ.
Z-score

• Z-scores are a way to compare results to a “normal”

population. Results from tests or surveys have
thousands of possible results and units; those results
can often seem meaningless.
• For example, knowing that someone’s weight is 150
pounds might be good information, but if you want to
compare it to the “average” person’s weight, looking
at a vast table of data can be overwhelming (especially
if some weights are recorded in kilograms).
• A z-score can tell you where that person’s weight is
compared to the average population’s mean weight.
Z-score

• The basic z score formula for a sample is:

z = (x – μ) / σ
• For example, let’s say you have a test score of 190. The
test has a mean (μ) of 150 and a standard deviation (σ)
of 25. Assuming a normal distribution, your z score
would be:
z = (x – μ) / σ
= (190 – 150) / 25 = 1.6.
• The z score tells you how many standard deviations
from the mean your score is. In this example, your score
is 1.6 standard deviations above the mean.
Z-score

• You may also see the z score formula shown to

the left.
• This is exactly the same formula as z = x – μ / σ,
except that x̄ (the sample mean) is used instead
of μ (the population mean) and s (the sample
standard deviation) is used instead of σ (the
population standard deviation). However, the
steps for solving it are exactly the same.
Standardization

• Standardization assumes that your data has a

Gaussian (bell curve) distribution.
• This does not strictly have to be true, but the
technique is more effective if your attribute
distribution is Gaussian.
• Standardization is useful when your data has
varying scales and the algorithm you are using does
make assumptions about your data having a
Gaussian distribution, such as linear regression,
logistic regression, and linear discriminant analysis.
Maximum Absolute Scaling

• Maximum absolute scaling scales the data to its maximum

value; that is, it divides every observation by the maximum
value of the variable:

• The result of the preceding transformation is a distribution in

which the values vary approximately within the range of -1 to
1.
• Scikit-learn recommends using this transformer on data that
is centered at zero or on sparse data.
• This scaler is sensitive to outliers if all the values are positive.
Robust Scaler

• Robust Scaler algorithms scale features that are robust to

outliers.
• The method it follows is almost similar to the MinMax Scaler
but it uses the interquartile range (rather than the min-max
used in MinMax Scaler).
• The median and scales of the data are removed by this
scaling algorithm according to the quantile range.
• It, thus, follows the following formula:

• Where Q1 is the 1st quartile, and Q3 is the third quartile.

Thank you
This presentation is created using LibreOffice Impress 5.1.6.2, can be used freely as per GNU General Public License

/mITuSkillologies @mitu_group /company/mitu- MITUSkillologies

skillologies

Web Resources
https://mitu.co.in
http://tusharkute.com

[email protected]
[email protected]

Wireless Mobile Charger
100% (4)
Wireless Mobile Charger
87 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
No ratings yet
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
7 pages
Model Selection and Feature Engineering
No ratings yet
Model Selection and Feature Engineering
64 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
5 Data Preprocessing III Editted Notes
No ratings yet
5 Data Preprocessing III Editted Notes
17 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
4 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
Lecture-11 - Feature Scaling
No ratings yet
Lecture-11 - Feature Scaling
26 pages
Lecture # 13 Data - Transformation - Techniques
No ratings yet
Lecture # 13 Data - Transformation - Techniques
36 pages
Feature Scaling
No ratings yet
Feature Scaling
13 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Feature Scaling Notes
No ratings yet
Feature Scaling Notes
4 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Data Normalization and Standardization
No ratings yet
Data Normalization and Standardization
6 pages
Feature Scaling Techniques: Machine Learning
No ratings yet
Feature Scaling Techniques: Machine Learning
27 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
ML Distance
No ratings yet
ML Distance
18 pages
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
No ratings yet
Normalization and Standardization: Methods To Preprocess Data To Have Consistent Scales and Distributions
10 pages
Normalization: Normalization Techniques at A Glance
No ratings yet
Normalization: Normalization Techniques at A Glance
5 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
Standardization Vs Normalization in Pattern Recognition
No ratings yet
Standardization Vs Normalization in Pattern Recognition
1 page
Data Normalization in Data Mining
No ratings yet
Data Normalization in Data Mining
8 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
Data Mining
No ratings yet
Data Mining
33 pages
Chap 2 Linear Regression - Part2
No ratings yet
Chap 2 Linear Regression - Part2
16 pages
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
Data Normalization
No ratings yet
Data Normalization
7 pages
3point5point2 Normalization
No ratings yet
3point5point2 Normalization
3 pages
Data Normalizationand Standardization ATechnical Report
No ratings yet
Data Normalizationand Standardization ATechnical Report
6 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Week 10
No ratings yet
Week 10
50 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Feature Scaling
No ratings yet
Feature Scaling
6 pages
Data Normalization and Standardization
No ratings yet
Data Normalization and Standardization
6 pages
dmdw2 2
No ratings yet
dmdw2 2
24 pages
IDS5
No ratings yet
IDS5
56 pages
Lec 7
No ratings yet
Lec 7
9 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
ML
No ratings yet
ML
6 pages
Chapter - 2 Data Mining
No ratings yet
Chapter - 2 Data Mining
21 pages
Normalization Vs Standardization
No ratings yet
Normalization Vs Standardization
2 pages
Step 06 - Data Preprocessing
No ratings yet
Step 06 - Data Preprocessing
10 pages
4 - Finding and Fixing Data Quality Issues
No ratings yet
4 - Finding and Fixing Data Quality Issues
48 pages
Unit 4
No ratings yet
Unit 4
33 pages
ML Notes
No ratings yet
ML Notes
44 pages
CS361 FA23 Lec2 Post
No ratings yet
CS361 FA23 Lec2 Post
67 pages
Normalization A Preprocessing Stage
No ratings yet
Normalization A Preprocessing Stage
5 pages
Features Scaling in Machine Learning
No ratings yet
Features Scaling in Machine Learning
5 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
DS Notes
No ratings yet
DS Notes
36 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Gvpcoew - Feature Scaling - Done
No ratings yet
Gvpcoew - Feature Scaling - Done
11 pages
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
D 4452 - 14 - X-Ray Radiography
No ratings yet
D 4452 - 14 - X-Ray Radiography
14 pages
Cell Biology Biochemistry - 3rd Ed.
100% (3)
Cell Biology Biochemistry - 3rd Ed.
113 pages
Hamza Tanveer Resume
No ratings yet
Hamza Tanveer Resume
2 pages
Hilton Mutambashora Mine Ventilation
No ratings yet
Hilton Mutambashora Mine Ventilation
6 pages
Larsen & Toubro Limited MSQ Up Gradation (Epcc 1) Project: LT/MSQ/P/UT/011
No ratings yet
Larsen & Toubro Limited MSQ Up Gradation (Epcc 1) Project: LT/MSQ/P/UT/011
11 pages
Process Engineering
No ratings yet
Process Engineering
4 pages
Solution Manual For Physics 4th Edition by Walker Instant Download
100% (4)
Solution Manual For Physics 4th Edition by Walker Instant Download
54 pages
DBMS Architecture Features
No ratings yet
DBMS Architecture Features
30 pages
ChE Objective Type Questions Compilation - Dean Medina
100% (1)
ChE Objective Type Questions Compilation - Dean Medina
130 pages
TEST 1 Latihan
No ratings yet
TEST 1 Latihan
4 pages
Class 9 - Physics - Holiday Homework
No ratings yet
Class 9 - Physics - Holiday Homework
6 pages
Problems in Biochem
No ratings yet
Problems in Biochem
3 pages
Material Handling New
No ratings yet
Material Handling New
70 pages
RE SpringVeneer ENG1
100% (1)
RE SpringVeneer ENG1
7 pages
Garlock Stress Saver Literature
No ratings yet
Garlock Stress Saver Literature
4 pages
GSS614 - GLS614 - CHAPTER 1A - Hydrographic Positioning
No ratings yet
GSS614 - GLS614 - CHAPTER 1A - Hydrographic Positioning
86 pages
VoLTE Parameters Setting-20151213
No ratings yet
VoLTE Parameters Setting-20151213
16 pages
Chapter 17
No ratings yet
Chapter 17
22 pages
Chem U3
No ratings yet
Chem U3
22 pages
Bitcoin Node - Bitpanda
No ratings yet
Bitcoin Node - Bitpanda
13 pages
DE0-Nano Computer NiosII
No ratings yet
DE0-Nano Computer NiosII
41 pages
V236H1 LE5 CHIMEIInnolux PDF
No ratings yet
V236H1 LE5 CHIMEIInnolux PDF
31 pages
Charger Tests
No ratings yet
Charger Tests
6 pages
Software Testing (18IS62) Module-1
No ratings yet
Software Testing (18IS62) Module-1
19 pages
Chapter 2
No ratings yet
Chapter 2
51 pages
12computer Science-Python Libraries and Idea of Efficiency-Notes
No ratings yet
12computer Science-Python Libraries and Idea of Efficiency-Notes
14 pages
Python ClassXII AI
No ratings yet
Python ClassXII AI
4 pages
Aga-Reference-Paper OK PDF
No ratings yet
Aga-Reference-Paper OK PDF
61 pages

Scaling Techniques

Uploaded by

Scaling Techniques

Uploaded by

Scaling Techniques

• Real world dataset contains features that highly vary in

• For example, assume your input dataset contains one

• Machine learning algorithms like linear

• The presence of feature value X in the formula will

• Distance algorithms like KNN, K-means, and SVM

• Since both the features have different scales, there

• Therefore, we scale our data before employing a

• The effect of scaling is conspicuous when we

• Tree-based algorithms, on the other hand, are fairly

• Normalization is a scaling technique in which values

• Case-1: normalizing 1000 –

v = 1000 , putting all values in the formula,we get

• Case-2: normalizing 2000 –

v = 2000, putting all values in the formula,we get

• Case-3: normalizing 3000 –

v=3000, putting all values in the formula,we get

• Case-4: normalizing 9000 –

v=9000, putting all values in the formula, we get

• Normalization is a good technique to use when you

• Standardization is another scaling technique where

• Simply put, a z-score (also called a standard score) gives

• Z-scores are a way to compare results to a “normal”

• The basic z score formula for a sample is:

• You may also see the z score formula shown to

• Standardization assumes that your data has a

• Maximum absolute scaling scales the data to its maximum

• The result of the preceding transformation is a distribution in

• Robust Scaler algorithms scale features that are robust to

• Where Q1 is the 1st quartile, and Q3 is the third quartile.

/mITuSkillologies @mitu_group /company/mitu- MITUSkillologies

You might also like