0% found this document useful (0 votes)
35 views14 pages

Feature Scaling in Machine Learning

The document discusses various feature scaling techniques in machine learning, including Absolute Maximum Scaling, Min-Max Scaling, Standardization, and Robust Scaling, each with its own formula and application. It also covers handling missing values using SimpleImputer and KNN Imputer, as well as methods for outlier detection and removal such as Z-Score and Interquartile Range (IQR). Visualization techniques like box plots and scatter plots are mentioned for identifying outliers in the dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views14 pages

Feature Scaling in Machine Learning

The document discusses various feature scaling techniques in machine learning, including Absolute Maximum Scaling, Min-Max Scaling, Standardization, and Robust Scaling, each with its own formula and application. It also covers handling missing values using SimpleImputer and KNN Imputer, as well as methods for outlier detection and removal such as Z-Score and Interquartile Range (IQR). Visualization techniques like box plots and scatter plots are mentioned for identifying outliers in the dataset.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Feature Scaling in Machine Learning

 Absolute Maximum Scaling: Absolute Maximum Scaling in Machine Learning


MaxAbs Scaling is a method used to scale features in your dataset, which is
especially useful when your data contains both positive and negative values. This
technique is designed to scale each feature by its absolute maximum value (i.e.,
the maximum value by absolute magnitude) while preserving the sign of the data.
 Formula:
 Min-Max Scaling: Min-Max Scaling is a feature scaling technique that transforms
the data into a specific range, typically [0, 1]. This is achieved by subtracting the
minimum value of the feature and dividing by the range (the difference between
the maximum and minimum values of the feature).
 Formula:
 Standardization: Standardization (also known as Z-score normalization) is a
technique used to scale features so that they have a mean of 0 and a standard
deviation of 1. This is done by subtracting the mean of the feature and dividing by
the standard deviation.
 Formula:
 Robust Scaling: Robust Scaling is a feature scaling technique that uses the
median and interquartile range (IQR) to scale the data. It is robust to outliers
because it does not rely on the mean and standard deviation, which can be
significantly affected by outliers. Instead, it scales features based on the median
and the IQR, making it more resistant to the influence of extreme values.
 Formula:
 Z-score normalization: Z-score normalization (also known as Standardization) is
a technique used in machine learning to transform features so that they have a
mean of 0 and a standard deviation of 1. This transformation is important when the
data has different scales or units, which can affect the performance of many
machine learning algorithms, especially those based on distance metrics (like k-
nearest neighbors or gradient-based algorithms).
 Formula:
Handling Missing Values in a CSV File using Sklearn

 Identify Missing Values


 Using SimpleImputer (Mean,Median,Mode)

Mean

Median
Mode

 Using KNN Imputer


Outlier Detection

1. Outlier: An outlier is a data point or observation that lies significantly outside the
general pattern of the data. It is an observation that is noticeably different from the
other data points and can either be much higher or much lower than the rest of the
dataset. Outliers can occur due to variability in the data, errors in data collection,
or even rare but valid occurrences.
2. How to identify outlier
• Using Visualization Techniques:
a) Box Plot Outliers appear as separate points outside the whiskers.
b) Scatter Plot Helps in spotting extreme values in two-dimensional data.
Remove the outlier from the dataset
a) Z-Score Method (Standard Score):- Measures how far a data point is from the
mean in terms of standard deviations.
Formula

Before Removing

After Removing
 ScatterPlot

b) Interquartile Range (IQR) Method:- Based on quartiles (Q1 and Q3).


Formula

Boxplot

Before Removing Outliers


After Removing Outliers
Aggregation function
 Mean, Min ,Max

You might also like