0% found this document useful (0 votes)

37 views2 pages

Outlier Detection and Removal

Uploaded by

Niharika Khanna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views2 pages

Outlier Detection and Removal

Uploaded by

Niharika Khanna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Outlier

What is an outlier?
An outlier is a data point that significantly deviates from the rest of the data. It can be either
much higher or much lower than the other data points, and its presence can have a significant
impact on the results of machine learning algorithms. They can be caused by measurement or
execution errors. The analysis of outlier data is referred to as outlier analysis or outlier
mining.
Types of Outliers
There are two main types of outliers:
Global outliers: Global outliers are isolated data points that are far away from the main body
of the data. They are often easy to identify and remove.
Contextual outliers: Contextual outliers are data points that are unusual in a specific context
but may not be outliers in a different context. They are often more difficult to identify and
may require additional information or domain knowledge to determine their significance.
Why Should You Detect Outliers?
In the machine learning pipeline, data cleaning and preprocessing is an important step as it
helps you better understand the data. During this step, you deal with missing values, detect
outliers, and more.
As outliers are very different values—abnormally low or abnormally high—their presence
can often skew the results of statistical analyses on the dataset. This could lead to less
effective and less useful models.
But dealing with outliers often requires domain expertise, and none of the outlier detection
techniques should be applied without understanding the data distribution and the use case.
Outliers Detection
How to Detect Outliers Using Standard Deviation
When the data, or certain features in the dataset, follow a normal distribution, you can use the
standard deviation of the data, or the equivalent z-score to detect outliers.
In statistics, standard deviation measures the spread of data around the mean, and in essence,
it captures how far away from the mean the data points are.
For data that is normally distributed, around 68.2% of the data will lie within one standard
deviation from the mean. Close to 95.4% and 99.7% of the data lie within two and three
standard deviations from the mean, respectively.

Let’s denote the standard deviation of the distribution by σ, and the mean by μ.

One approach to outlier detection is to set the lower limit to three standard deviations below
the mean (μ - 3*σ), and the upper limit to three standard deviations above the mean (μ +
3*σ). Any data point that falls outside this range is detected as an outlier.
As 99.7% of the data typically lies within three standard deviations, the number of outliers
will be close to 0.3% of the size of the dataset.
Detecting outliers Using the Interquartile Range (IQR)
In statistics, interquartile range or IQR is a quantity that measures the difference between the
first and the third quartiles in a given dataset.

 The first quartile is also called the one-fourth quartile, or the 25% quartile.

 If q25 is the first quartile, it means 25% of the points in the dataset have values less
than q25.

 The third quartile is also called the three-fourth, or the 75% quartile.

 If q75 is the three-fourth quartile, 75% of the points have values less than q75.
 Using the above notations, IQR = q75 - q25.

the interquartile range works by dropping all points that are outside the range [q25 - 1.5*IQR,
q75 + 1.5*IQR] as outliers.

 If the data, or feature of interest is normally distributed, you may use standard
deviation and z-score to label points that are farther than three standard deviations
away from the mean as outliers.

 If the data is not normally distributed, you can use the interquartile range or
percentage methods to detect outliers.
Removing Outliers
Once outliers have been identified, the next step is to remove them. There are several
methods for removing outliers, including:
•Trimming: This involves removing a certain percentage of the data that falls outside a
specified range.
•Winsorizing: This involves replacing extreme values with the nearest value that falls within
a specified range.
•Imputation: This involves replacing missing or extreme values with a substitute value, such
as the mean or median of the data.

Feature Engineering
No ratings yet
Feature Engineering
63 pages
Introduction To Outlier Analysis Complete
No ratings yet
Introduction To Outlier Analysis Complete
12 pages
Lec06 7 Feature Engineering 08112022 100115am
No ratings yet
Lec06 7 Feature Engineering 08112022 100115am
44 pages
Demand Outliers
No ratings yet
Demand Outliers
37 pages
Fundamentals Stats
No ratings yet
Fundamentals Stats
44 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Concepts of EDA, Outliers-Detection and Treatment
No ratings yet
Concepts of EDA, Outliers-Detection and Treatment
99 pages
Advanced Data Analysis Techniques 3
No ratings yet
Advanced Data Analysis Techniques 3
31 pages
Empirical Rule and Outliers 1721456291
No ratings yet
Empirical Rule and Outliers 1721456291
13 pages
Outliers in Machine Learning
No ratings yet
Outliers in Machine Learning
13 pages
Nikita Prasad - Outliers Basics
No ratings yet
Nikita Prasad - Outliers Basics
13 pages
Lecture 3
No ratings yet
Lecture 3
23 pages
Explanatory Data Analysis
100% (1)
Explanatory Data Analysis
28 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Mathematical
No ratings yet
Mathematical
14 pages
1 Program
No ratings yet
1 Program
20 pages
Test To Identify Outliers in Data Series
100% (1)
Test To Identify Outliers in Data Series
16 pages
Anomaly Detection
No ratings yet
Anomaly Detection
10 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-III
No ratings yet
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-III
4 pages
Lecture 8 Data Prepration Techniques
No ratings yet
Lecture 8 Data Prepration Techniques
4 pages
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
No ratings yet
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
69 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
Outlier Analysis
No ratings yet
Outlier Analysis
28 pages
Datamining Seminar
No ratings yet
Datamining Seminar
19 pages
5 Ways To Find Outliers in Your Data - Statistics by Jim
No ratings yet
5 Ways To Find Outliers in Your Data - Statistics by Jim
35 pages
Outlier Analysis in Data Mining
No ratings yet
Outlier Analysis in Data Mining
5 pages
Numericalquestionsonzscoreand IQ
No ratings yet
Numericalquestionsonzscoreand IQ
3 pages
Identifying and Handling Outliers in Pandas - A Step-By-Step Guide - by Arvid Eichner - Python in Plain English
No ratings yet
Identifying and Handling Outliers in Pandas - A Step-By-Step Guide - by Arvid Eichner - Python in Plain English
19 pages
Outliers
No ratings yet
Outliers
5 pages
Outliers
No ratings yet
Outliers
3 pages
Guide On Outlier Detection Methods
No ratings yet
Guide On Outlier Detection Methods
11 pages
Finding Outliers 2 Wayes Z-Score and Interquortile Range
No ratings yet
Finding Outliers 2 Wayes Z-Score and Interquortile Range
1 page
Outliers Z-Score
No ratings yet
Outliers Z-Score
1 page
Univariate Outlier Detection
No ratings yet
Univariate Outlier Detection
9 pages
Unit 4
No ratings yet
Unit 4
17 pages
Outlier Detection in Non-Gaussian Distributions Uitschieter Detectie in Niet-Gauss Verdelingen
No ratings yet
Outlier Detection in Non-Gaussian Distributions Uitschieter Detectie in Niet-Gauss Verdelingen
45 pages
3-Introduction To Data Cleaning Outlires
No ratings yet
3-Introduction To Data Cleaning Outlires
5 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
Discusion Forum Unit 2
No ratings yet
Discusion Forum Unit 2
2 pages
Numerical Measures of Relative Standing: Fall 2016-2017 MGT 205 1
No ratings yet
Numerical Measures of Relative Standing: Fall 2016-2017 MGT 205 1
44 pages
What Is Outlier
No ratings yet
What Is Outlier
3 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Outlier Treatment
No ratings yet
Outlier Treatment
16 pages
Handling Outliers
No ratings yet
Handling Outliers
6 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
Aqrm Lecture 6
No ratings yet
Aqrm Lecture 6
17 pages
Banking Theory Law and Practice
No ratings yet
Banking Theory Law and Practice
17 pages
Outliers ML
No ratings yet
Outliers ML
14 pages
Detecting Data Outliers
No ratings yet
Detecting Data Outliers
7 pages
Handling Ouliers
No ratings yet
Handling Ouliers
5 pages
Outlier Detection
No ratings yet
Outlier Detection
41 pages
ISAT 600 Progress Report 3
No ratings yet
ISAT 600 Progress Report 3
4 pages
Krishnendu PCB-IT602B
No ratings yet
Krishnendu PCB-IT602B
11 pages
4 - Outliers - +transformaations ML
No ratings yet
4 - Outliers - +transformaations ML
28 pages
OUTLIERS
100% (1)
OUTLIERS
5 pages
Westinghouse W4207 TV User Manual
No ratings yet
Westinghouse W4207 TV User Manual
24 pages
Notes PDF ML Day 17
No ratings yet
Notes PDF ML Day 17
9 pages
How To Calculate Outliers
No ratings yet
How To Calculate Outliers
7 pages
Detecting Data Outliers
No ratings yet
Detecting Data Outliers
7 pages
Distribution Restriction:: Approved For Public Release Distribution Is Unlimited
100% (1)
Distribution Restriction:: Approved For Public Release Distribution Is Unlimited
386 pages
Topic/ Lesson: Communicative Style
No ratings yet
Topic/ Lesson: Communicative Style
8 pages
Rail Gun
100% (1)
Rail Gun
20 pages
LED Thin PAR64 User Manual
No ratings yet
LED Thin PAR64 User Manual
10 pages
5th Grade Colonial Village Unit Plan
100% (1)
5th Grade Colonial Village Unit Plan
25 pages
MOFP - Families Fabaceae, Brassicaceae, Malvaceae
No ratings yet
MOFP - Families Fabaceae, Brassicaceae, Malvaceae
2 pages
400 (M) G Alfa Romeo 166 01
No ratings yet
400 (M) G Alfa Romeo 166 01
3 pages
California Utility Bill PDF 2 1
No ratings yet
California Utility Bill PDF 2 1
1 page
Wuthering Heights Timeline Project Questions
No ratings yet
Wuthering Heights Timeline Project Questions
2 pages
Grade Sheet Percentile Calculation
No ratings yet
Grade Sheet Percentile Calculation
15 pages
MC 10161751 9999
No ratings yet
MC 10161751 9999
3 pages
Theo 5 - Module 4
No ratings yet
Theo 5 - Module 4
26 pages
PA 6.0 Amplifier Datasheet
No ratings yet
PA 6.0 Amplifier Datasheet
6 pages
Senior Software Engineer Web Api 11
No ratings yet
Senior Software Engineer Web Api 11
7 pages
TAN, MEA S. Unlocking Writing Potential The Impact of Multisensory Activities On The Writing Skills of Struggling Kindergarten Learners of Leon Consumo Memorial Elementary School
No ratings yet
TAN, MEA S. Unlocking Writing Potential The Impact of Multisensory Activities On The Writing Skills of Struggling Kindergarten Learners of Leon Consumo Memorial Elementary School
11 pages
Presentation 1
No ratings yet
Presentation 1
24 pages
Ref - Integrity Problems of Concrete Piles - FPrimeC - FPrimeC Solutions Inc
No ratings yet
Ref - Integrity Problems of Concrete Piles - FPrimeC - FPrimeC Solutions Inc
7 pages
Antenna Fundamentals
No ratings yet
Antenna Fundamentals
36 pages
8 More Projects
No ratings yet
8 More Projects
10 pages
Papyrus History Lesson XL
No ratings yet
Papyrus History Lesson XL
9 pages
Vestibular Neuritis and Labyrinthitis - UpToDate PDF
No ratings yet
Vestibular Neuritis and Labyrinthitis - UpToDate PDF
18 pages
Rs 007
No ratings yet
Rs 007
1 page
Research Scope - Period Panties Market. - Global Industry Analysis Size Share Growth Trends and Forecasts 2023 - 2031
No ratings yet
Research Scope - Period Panties Market. - Global Industry Analysis Size Share Growth Trends and Forecasts 2023 - 2031
13 pages
Plato's Apology Essay
No ratings yet
Plato's Apology Essay
2 pages
Gynecology & Obstetrics
No ratings yet
Gynecology & Obstetrics
5 pages
Polysafe Strata Product Spec
No ratings yet
Polysafe Strata Product Spec
1 page
Mega Booster
No ratings yet
Mega Booster
1 page
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet

Outlier Detection and Removal

Uploaded by

Outlier Detection and Removal

Uploaded by

Outlier

You might also like