Lecture-11 - Feature Scaling
Lecture-11 - Feature Scaling
Statistical Machine
Learning
Course Instructor
Dr. Ashima Yadav
Feature scaling
# Example data
data = np.array([[1, 10], [2, 20], [3, 30], [4, 40], [5, 50]])
# Standardization
scaler_standard = StandardScaler()
data_standardized = scaler_standard.fit_transform(data)
# Normalization
scaler_minmax = MinMaxScaler()
data_normalized = scaler_minmax.fit_transform(data)
# Robust Scaling
robust_scaler = RobustScaler()
data_robust_scaled = robust_scaler.fit_transform(data)
print("Robust Scaled Data:\n", data_robust_scaled)
Output:
When to Use Different Scaling Techniques:
• Min-Max Scaling: When you need the data in a specific range (e.g., [0, 1]) or
when working with algorithms like neural networks that benefit from
normalized input.
• Standardization: When working with algorithms like linear regression, logistic
regression, and SVMs that assume normally distributed data.
• Robust Scaling: When your dataset contains outliers that could distort the
scaling.
• Symmetry: The distribution is symmetric around the mean, meaning the left
and right halves of the curve are mirror images of each other.
• Bell-Shaped Curve: The highest point on the curve is at the mean, and the
tails of the distribution approach the horizontal axis but never touch it.
• Mean, Median, and Mode: In a normal distribution, the mean, median, and
mode of the data are all equal and located at the center of the distribution.
What is Normal Distribution?
What is Normal Distribution?
68-95-99.7 Rule:
• In a normal distribution:68% of the data falls within 1 standard deviation (σ) of
the mean (μ).
• 95% falls within 2 standard deviations.
• 99.7% falls within 3 standard deviations
What is Normal Distribution?
What is Normal Distribution?