0% found this document useful (0 votes)
65 views26 pages

Lecture-11 - Feature Scaling

AI

Uploaded by

shanksthe4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views26 pages

Lecture-11 - Feature Scaling

AI

Uploaded by

shanksthe4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ECSE219L –

Statistical Machine
Learning

Course Instructor
Dr. Ashima Yadav
Feature scaling

• Feature scaling is a crucial preprocessing step in machine learning


where the range of the features is standardized or normalized.
• This ensures that all features contribute equally to the model,
especially when they have different units or scales.
• Feature scaling is important for many machine learning algorithms,
such as gradient-based methods (e.g., logistic regression, neural
networks) and distance-based algorithms (e.g., K-Nearest Neighbors,
SVM).
Why Feature Scaling is Important:
• Prevents Dominance of Features: Without scaling, features
with larger ranges or units can dominate the model's
learning process, leading to biased results.
• Speeds Up Convergence: Algorithms like gradient descent
converge faster when features are on a similar scale.
• Improves Accuracy: Distance-based algorithms like KNN,
SVM, and clustering methods rely on distance calculations,
so having features on the same scale ensures that no
feature disproportionately influences the distance metric.
Common Feature Scaling Techniques:
When to Use:

• When you need the features to be within a specific range


(e.g., [0, 1]).
• When working with algorithms that do not assume any
specific distribution and are sensitive to the scale of input
features, such as neural networks or distance-based
methods (e.g., K-Nearest Neighbors, SVM with RBF
kernel).
Example
Let's assume we have a dataset with the following values:
Feature scaling
Example:
Let's assume we have a dataset with the following values:
Key Differences
Purpose:
• Standardization: Centers the data around the mean (0) and scales it to have a
standard deviation of 1. It’s used when you want to give equal importance to all
features regardless of their original scale.
• Normalization: Scales the data to a specific range, usually [0, 1]. It’s used when you
want to constrain features within a specific boundary or when different features
have different ranges.
Use Cases:
• Standardization: Preferred when the algorithm assumes normally
distributed data or when data is normally distributed.
• Normalization: Preferred when the scale of features varies widely and
you need to constrain features to a specific range.
Key Differences
Effect on Data:
• Standardization: Shifts and scales the data to have a mean of 0
and a standard deviation of 1, but it doesn't necessarily bound
the values within a fixed range.
• Normalization: Scales the data to fit within a specific range (e.g.,
[0, 1]), ensuring that all values lie within this range.
Example in Python:
import numpy as np
from sklearn.preprocessing import StandardScaler,
MinMaxScaler

# Example data
data = np.array([[1, 10], [2, 20], [3, 30], [4, 40], [5, 50]])

# Standardization
scaler_standard = StandardScaler()
data_standardized = scaler_standard.fit_transform(data)
# Normalization
scaler_minmax = MinMaxScaler()
data_normalized = scaler_minmax.fit_transform(data)

print("Original Data:\n", data)


print("Standardized Data:\n", data_standardized)
print("Normalized Data:\n", data_normalized)
Output
Feature scaling
Example: Feature Scaling with Python (Scikit-Learn)
from sklearn.preprocessing import MinMaxScaler,
StandardScaler, RobustScaler
import numpy as np
# Min-Max Scaling
# Sample data (2 features with different scales) min_max_scaler = MinMaxScaler()
data = np.array([[1, 1000], data_min_max_scaled = min_max_scaler.fit_transform(data)
[2, 1500], print("Min-Max Scaled Data:\n", data_min_max_scaled)
[3, 2000],
[4, 2500], # Standardization (Z-Score Normalization)
[5, 3000]]) standard_scaler = StandardScaler()
data_standard_scaled = standard_scaler.fit_transform(data)
print("Standard Scaled Data:\n", data_standard_scaled)

# Robust Scaling
robust_scaler = RobustScaler()
data_robust_scaled = robust_scaler.fit_transform(data)
print("Robust Scaled Data:\n", data_robust_scaled)
Output:
When to Use Different Scaling Techniques:
• Min-Max Scaling: When you need the data in a specific range (e.g., [0, 1]) or
when working with algorithms like neural networks that benefit from
normalized input.
• Standardization: When working with algorithms like linear regression, logistic
regression, and SVMs that assume normally distributed data.
• Robust Scaling: When your dataset contains outliers that could distort the
scaling.

In conclusion, feature scaling is essential in preparing your data for machine


learning models, ensuring that each feature contributes equally to the model
and that learning algorithms can perform optimally.
What is Normal Distribution?
The normal distribution, also known as the Gaussian distribution, is a continuous
probability distribution that is symmetrical and bell-shaped. It is one of the most
important distributions in statistics and machine learning due to its prevalence in
natural and social phenomena. The key characteristics of a normal distribution are:

• Symmetry: The distribution is symmetric around the mean, meaning the left
and right halves of the curve are mirror images of each other.
• Bell-Shaped Curve: The highest point on the curve is at the mean, and the
tails of the distribution approach the horizontal axis but never touch it.
• Mean, Median, and Mode: In a normal distribution, the mean, median, and
mode of the data are all equal and located at the center of the distribution.
What is Normal Distribution?
What is Normal Distribution?
68-95-99.7 Rule:
• In a normal distribution:68% of the data falls within 1 standard deviation (σ) of
the mean (μ).
• 95% falls within 2 standard deviations.
• 99.7% falls within 3 standard deviations
What is Normal Distribution?
What is Normal Distribution?

• The empirical rule allows researchers to


calculate the probability of randomly
obtaining a score from a normal
distribution.
• 68% of data falls within the first standard
deviation from the mean.
• This means there is a 68% probability of
randomly selecting a score between -1 and
+1 standard deviations from the mean.
What is Normal Distribution?

• 95% of the values fall within two standard


deviations from the mean.
• This means there is a 95% probability of
randomly selecting a score between -2 and
+2 standard deviations from the mean.
Importance of Normal Distribution in Machine Learning
• Assumptions in Algorithms: Many machine learning algorithms, such as linear regression,
logistic regression, and linear discriminant analysis (LDA), assume that the input features or
errors follow a normal distribution. If this assumption holds, these algorithms can perform
optimally.
• Central Limit Theorem (CLT): The CLT states that the sum (or average) of a large number of
independent, identically distributed random variables will be approximately normally
distributed, regardless of the original distribution. This means that even if your data is not
normally distributed, the distribution of the sample means will tend to be normal if the
sample size is large enough. This property is often used in hypothesis testing and
confidence intervals.
• Parameter Estimation: In statistical modeling and machine learning, normal distribution is
used to estimate parameters like mean and variance, which are essential for modeling and
prediction. For example, maximum likelihood estimation (MLE) often assumes normality in
the data.
Importance of Normal Distribution in Machine Learning

• Data Transformation: If the data does not follow a normal distribution,


transforming the data (e.g., using a logarithmic or square root transformation)
to make it more normally distributed can improve the performance of some
machine learning models.
• Outlier Detection: In a normal distribution, values that lie far from the mean
(typically beyond 3 standard deviations) are considered outliers. This can help
in identifying and treating outliers in datasets.
• PCA (Principal Component Analysis): PCA assumes that the data is normally
distributed and works well when this assumption is met. PCA is used for
dimensionality reduction, and normal distribution helps in ensuring that the
principal components capture most of the variance in the data.
Applications in Machine Learning:
• Linear Regression: The assumption of normally distributed residuals (errors)
ensures that the linear regression model makes reliable predictions and that
confidence intervals for coefficients are valid.
• Naive Bayes Classifier: In the Gaussian Naive Bayes variant, it is assumed that the
features follow a normal distribution, which simplifies the computation of
probabilities.
• Hypothesis Testing: Many statistical tests, like t-tests and z-tests, assume that
the underlying data is normally distributed. These tests are used in feature
selection, A/B testing, and model evaluation.
• Anomaly Detection: In anomaly detection, normal distribution helps define what
is considered "normal" behavior. Points far from the mean can be flagged as
anomalies or outliers.
When Data is Not Normally Distributed:
Not all data follows a normal distribution. In such cases, you can:
• Transform the Data: Apply transformations (e.g., logarithmic, square root) to
make the data more normally distributed.
• Use Non-Parametric Methods: Algorithms that do not assume a normal
distribution, such as decision trees, random forests, and support vector
machines, can be employed.

In conclusion, understanding the normal distribution and its properties can


significantly enhance your ability to build and evaluate machine learning models
effectively. Whether through directly leveraging this distribution's assumptions or
using transformations and techniques to adjust for non-normal data, it is a
foundational concept in data science and machine learning.

You might also like