Feature Scaling Notes

Feature Scaling is a data preprocessing technique that standardizes the range of independent features in a dataset to ensure that no single feature dominates the learning algorithm's performance. It is crucial for classification algorithms that rely on distance calculations, as unscaled features can lead to inaccurate predictions. Methods for feature scaling include using the preprocessing.scale() function for standardization and MinMaxScaler for rescaling features to a specified range.

Uploaded by

leotfuwksffuqawhov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views4 pages

Feature Scaling Notes

Uploaded by

leotfuwksffuqawhov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Machine Learning

Notes: Feature Scaling

Feature Scaling

Feature Scaling is a data preprocessing technique. By preprocessing, we mean the

transformations that are applied to the data before it is fed into some algorithm for some
processing.

What is Feature Scaling?

Feature Scaling is a technique where we standardize the range of all independent features of a
data-set. It is also called Normalization.

Generally, when we get raw data, all the features values varies on different scales. It is important
to bring all the feature values on the scale so that value of one feature should not dominate over
the others and hinder the performance of the learning algorithm. This process of bringing all the
features values on the same scale is called feature scaling. Ensuring standardised feature values
implicitly weights all features equally in their representation.

Feature scaling or re-scaling of the features is performed such that they have the properties of
normal distribution (most of the time) where values have standard deviation = 1, and mean = 0.

Why Feature Scaling is required?

Most of the real world applied machine learning algorithms are classification algorithms. Many
of the classification algorithms works by calculating the distance between data points in space. If
one feature has a wide range of values, then this feature is likely to dominate the distance
measure between the data points over other features. Above this, if this feature proves to be
insignificant in the end then, it will be hindering the algorithm results to a large extent. This will
result in decrease in accuracy of the algorithm.

For example : Let us look at the subset of wine dataset:

The range of value for “Magnesium” feature is 80 - 100 or above, while range of values of
“Malic_Acid” feature is ranging 1.something.
Now if we apply distance formula(as we do in case of KNN) on two data points here,
Magnesium will dominate the distance value to a greater extent than any of the other feature,
thus resulting in wrong predictions later.

Note - Distance Formula is given as d = (( x1 - x2 )^2 + (y1-y2)^2....)^½

Let’s understand this better with the help of another example –

Suppose you have a company employees’ data. Now the age of employees in a company may be
between 21-70 years, the size of the house they live is 500-5000 Sq feet and their salaries may
range from 30000−80000. In this situation if you use a simple Euclidean metric, the age feature
will not play any role because it is several order smaller than other features. However, it may
contain some important information that may be useful for the task. Here, you may want to
normalize the features independently to the same scale, say [0,1], so they contribute equally
while computing the distance.”

How Feature Scaling is applied in sklearn?

There are many ways by which we can apply feature scaling on the dataset:
1. The easiest way of scaling is to use - preprocessing.scale() function
A numpy array of values is given as input and output is numpy array with scaled values.

This will scale the values in such a way that mean of the values will be 0 and standard
deviation will be 1.

2.
3. Another method is to scale the features between given minimum and maximum values,
generally between 0 and 1.
Function -preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True)

Feature_range is given in the form of tuple - (min , max)

Copy - True (default), set it to False if you want inplace transformation

Formula used -

CBSE Chapter Wise Question 2024
100% (2)
CBSE Chapter Wise Question 2024
60 pages
Chap 2 Linear Regression - Part2
No ratings yet
Chap 2 Linear Regression - Part2
16 pages
Feature Scaling
No ratings yet
Feature Scaling
13 pages
Lec 7
No ratings yet
Lec 7
9 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Towards Data Science All About Feature Scaling
No ratings yet
Towards Data Science All About Feature Scaling
16 pages
Scaling Techniques
No ratings yet
Scaling Techniques
30 pages
Model Selection and Feature Engineering
No ratings yet
Model Selection and Feature Engineering
64 pages
Feature Scaling Techniques: Machine Learning
No ratings yet
Feature Scaling Techniques: Machine Learning
27 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
Salary Estimation Using K-Nearest Neighbour
No ratings yet
Salary Estimation Using K-Nearest Neighbour
1 page
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
4 pages
Lecture-11 - Feature Scaling
No ratings yet
Lecture-11 - Feature Scaling
26 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Feature Scaling
No ratings yet
Feature Scaling
6 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
ML Notes
No ratings yet
ML Notes
44 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Session 7 Feature Selection & Dimensionality Reduction
No ratings yet
Session 7 Feature Selection & Dimensionality Reduction
20 pages
Week 10
No ratings yet
Week 10
50 pages
Feature Engineering
No ratings yet
Feature Engineering
50 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Assignment 121
No ratings yet
Assignment 121
9 pages
Djghuh
No ratings yet
Djghuh
2 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Feature Engineering: Getting The Most Out of Data For Predictive Models
No ratings yet
Feature Engineering: Getting The Most Out of Data For Predictive Models
75 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Features Scaling in Machine Learning
No ratings yet
Features Scaling in Machine Learning
5 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
No ratings yet
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
7 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
ML Distance
No ratings yet
ML Distance
18 pages
Feature Engineering PDF
100% (1)
Feature Engineering PDF
75 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
6 pages
Features Scaling in Machine Learning
No ratings yet
Features Scaling in Machine Learning
6 pages
Practical 6
No ratings yet
Practical 6
6 pages
Mini 4
No ratings yet
Mini 4
9 pages
ML Da
No ratings yet
ML Da
55 pages
ML Lecture # 04 Multiple Regression
No ratings yet
ML Lecture # 04 Multiple Regression
29 pages
Unit 2 Part 2
No ratings yet
Unit 2 Part 2
6 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
Data Preparation.2
No ratings yet
Data Preparation.2
18 pages
ML Lab Exam Document
No ratings yet
ML Lab Exam Document
14 pages
Scikit Learn
No ratings yet
Scikit Learn
28 pages
287 - Sougata Saha - Scaling Training and Test Data
No ratings yet
287 - Sougata Saha - Scaling Training and Test Data
11 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
Why Do You Need To Scale Data in KNN: 3 Answers
No ratings yet
Why Do You Need To Scale Data in KNN: 3 Answers
1 page
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
Data Pre-Processing Python For Beginner
No ratings yet
Data Pre-Processing Python For Beginner
12 pages
01 Basics 02knn 02
No ratings yet
01 Basics 02knn 02
7 pages
Lecture 2.3 Data Normalization
No ratings yet
Lecture 2.3 Data Normalization
7 pages
Standar Ization
No ratings yet
Standar Ization
7 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
CO4CRT12 - Quantitative Techniques For Business - II (T)
No ratings yet
CO4CRT12 - Quantitative Techniques For Business - II (T)
4 pages
07-01 Continuous Distributions Using Excel
No ratings yet
07-01 Continuous Distributions Using Excel
9 pages
1 How To Code Model Predictive Control in TRACI SUMO
No ratings yet
1 How To Code Model Predictive Control in TRACI SUMO
5 pages
TT 12 Future Computation Cognitive Adaptive Iccgi Advcomp
No ratings yet
TT 12 Future Computation Cognitive Adaptive Iccgi Advcomp
21 pages
621 PDF
No ratings yet
621 PDF
1 page
Dijkstra
No ratings yet
Dijkstra
2 pages
Automata Assignment
No ratings yet
Automata Assignment
8 pages
An Example of A Simulation 2d Ising Model With C++ PDF
No ratings yet
An Example of A Simulation 2d Ising Model With C++ PDF
10 pages
Noiseless Channel: Nyquist Bit Rate Noisy Channel: Shannon Capacity Using Both Limits
100% (1)
Noiseless Channel: Nyquist Bit Rate Noisy Channel: Shannon Capacity Using Both Limits
10 pages
CS585 FinalExam Spring2023 Solution
No ratings yet
CS585 FinalExam Spring2023 Solution
8 pages
OLS2
No ratings yet
OLS2
4 pages
Enhancing Reliability and Scalability of Microservices Through AI/ML-Driven Automated Testing Methodologies
No ratings yet
Enhancing Reliability and Scalability of Microservices Through AI/ML-Driven Automated Testing Methodologies
34 pages
GenMath Q1 W4
No ratings yet
GenMath Q1 W4
16 pages
Mdof FRF Curvefit
No ratings yet
Mdof FRF Curvefit
11 pages
Unit 7 - Week 4: Assignment 4
No ratings yet
Unit 7 - Week 4: Assignment 4
5 pages
Assesment Polya
100% (1)
Assesment Polya
2 pages
KNN Paper
No ratings yet
KNN Paper
11 pages
AI Project Cycle (Class-IX)
No ratings yet
AI Project Cycle (Class-IX)
2 pages
ML Unit-4
No ratings yet
ML Unit-4
40 pages
Global Dynamics of A Predator-Prey System With Holling Type II Functional Response
No ratings yet
Global Dynamics of A Predator-Prey System With Holling Type II Functional Response
12 pages
Advanced Engineering Mathematics: Dr. Yasir Ali (Yali@ceme - Nust.edu - PK)
No ratings yet
Advanced Engineering Mathematics: Dr. Yasir Ali (Yali@ceme - Nust.edu - PK)
9 pages
Synopsis Vyom
No ratings yet
Synopsis Vyom
11 pages
Assignment 3
No ratings yet
Assignment 3
12 pages
Functional Equations
No ratings yet
Functional Equations
5 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Solar Power Forecasting Using Different ML Algorithms
No ratings yet
Solar Power Forecasting Using Different ML Algorithms
12 pages
1 Lecture Block Diagram Representation of Control Systems
No ratings yet
1 Lecture Block Diagram Representation of Control Systems
65 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
A Tale of Three Cities
No ratings yet
A Tale of Three Cities
52 pages