Feature Scaling in Machine Learning
Feature Scaling in Machine Learning
Mean
Median
Mode
1. Outlier: An outlier is a data point or observation that lies significantly outside the
general pattern of the data. It is an observation that is noticeably different from the
other data points and can either be much higher or much lower than the rest of the
dataset. Outliers can occur due to variability in the data, errors in data collection,
or even rare but valid occurrences.
2. How to identify outlier
• Using Visualization Techniques:
a) Box Plot Outliers appear as separate points outside the whiskers.
b) Scatter Plot Helps in spotting extreme values in two-dimensional data.
Remove the outlier from the dataset
a) Z-Score Method (Standard Score):- Measures how far a data point is from the
mean in terms of standard deviations.
Formula
Before Removing
After Removing
ScatterPlot
Boxplot