0% found this document useful (0 votes)
19 views5 pages

ML Normalization Techniques - Overview & Practical Guide

The document discusses various normalization techniques in machine learning, including Min-Max, Z-Score, Max Absolute, Robust Scaling, Logarithmic, and L2 normalization. Each technique is explained with its formula, description, use cases, and Python implementation examples. The document also provides a comparison of recommended normalization methods for different algorithms.

Uploaded by

plasticintheair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

ML Normalization Techniques - Overview & Practical Guide

The document discusses various normalization techniques in machine learning, including Min-Max, Z-Score, Max Absolute, Robust Scaling, Logarithmic, and L2 normalization. Each technique is explained with its formula, description, use cases, and Python implementation examples. The document also provides a comparison of recommended normalization methods for different algorithms.

Uploaded by

plasticintheair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Normalization Techniques in Machine Learning

Normalization is a preprocessing step where data is rescaled to fit within a specific range or
distribution, improving model performance and convergence. Below is a detailed explanation of
each type, along with Python examples.

1. Min-Max Normalization

Formula:

𝐗−𝐗𝐦𝐢𝐧
𝐗′ =
𝐗𝐦𝐚𝐱−𝐗𝐦𝐢𝐧

Description:

 Rescales data to a range of [0,1][0, 1][0,1] or [a,b][a, b][a,b].


 Preserves relationships between values and does not handle outliers well.

Use Cases:

 When features have different ranges.


 For algorithms sensitive to magnitudes, such as KNN, SVM, and Neural Networks.
 Suitable for image data (e.g., pixel values).

Python Implementation:

from sklearn.preprocessing import MinMaxScaler


import numpy as np

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])


scaler = MinMaxScaler()
min_max_data = scaler.fit_transform(data)
print("Min-Max Normalized Data:\n", min_max_data)

2. Z-Score Normalization (Standardization)

Formula:


𝐗−𝛍
𝐗 =
𝝈
Where:
 μ: Mean of the feature.
 σ: Standard deviation.

Description:

 Centers data around a mean of 000 and standard deviation of 1.


 Effective for algorithms assuming normally distributed data.

Use Cases:

 Features with differing scales.


 Algorithms like Logistic Regression, Linear Regression, and PCA.

Python Implementation:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
z_score_data = scaler.fit_transform(data)
print("Z-Score Normalized Data:\n", z_score_data)

3. Max Absolute Normalization

Formula:

𝑿
𝐗′ =
∣ 𝐗𝐦𝐚𝐱 ∣
Description:

 Scales data by the maximum absolute value of the feature.


 Retains sparsity of data.

Use Cases:

 Sparse datasets like text data or recommendation systems.


 Models such as Lasso Regression.

Python Implementation:

from sklearn.preprocessing import MaxAbsScaler

scaler = MaxAbsScaler()
max_abs_data = scaler.fit_transform(data)
print("Max Absolute Normalized Data:\n", max_abs_data)
4. Robust Scaling

Formula:

𝐗 − 𝐦𝐞𝐝𝐢𝐚𝐧
𝐗′ =
𝑰𝑸𝑹
Where:

 IQR = Q3−Q1 (Interquartile Range).

Description:

 Centers data using the median and scales by the IQR.


 Handles outliers effectively.

Use Cases:

 Datasets with significant outliers.


 Algorithms like Gradient Boosting or Tree-based models.

Python Implementation:

from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()
robust_data = scaler.fit_transform(data)
print("Robust Scaled Data:\n", robust_data)

5. Logarithmic Normalization

Formula:

𝐗′ = 𝐥𝐨𝐠(𝐗 + 𝐜)
Where:

 c: A constant to handle zero or negative values.

Description:

 Reduces the impact of large values by compressing the range.


 Helps in reducing skewness.
Use Cases:

 Features with exponential growth or large ranges.


 Financial and population data.

Python Implementation:

import numpy as np

data = np.array([[1, 10, 100], [2, 20, 200], [3, 30, 300]])
log_data = np.log1p(data) # log1p is log(X + 1)
print("Logarithmic Normalized Data:\n", log_data)

6. L2 Normalization

Formula:

𝑿
𝐗′ =
∥𝐗∥𝟐
Where:

∥ 𝐗 ∥ 𝟐 = √∑𝐗𝟐
Description:

 Scales each data point so its Euclidean norm is 1.


 Commonly used for feature vectors.

Use Cases:

 Similarity-based algorithms like KNN or clustering.


 Text processing and recommendation systems.

Python Implementation:

from sklearn.preprocessing import Normalizer

scaler = Normalizer(norm='l2')
l2_data = scaler.fit_transform(data)
print("L2 Normalized Data:\n", l2_data)

Comparison and When to Use


Algorithm Recommended Normalization Reason

KNN, SVM, Neural Networks Min-Max or Z-Score Sensitive to scale.

Linear/Logistic Regression Z-Score or Robust Scaling Assumes normal distribution.

Tree-Based Models None or Robust Scaling Less sensitive to scaling.

PCA, Clustering (K-Means) Min-Max or Z-Score Distance metric dependent.

Text or Sparse Data Models Max Absolute or L2 Maintains sparsity.

You might also like