0% found this document useful (0 votes)

11 views22 pages

Anomaly Detection

Uploaded by

Akash Singhania

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views22 pages

Anomaly Detection

Uploaded by

Akash Singhania

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Anomaly Detection

Anomaly/Outlier
Detection

• Anomaly - The set of data points

that are considerably different
than the remainder of the data

• Anomaly detection is a technique

used to identify these outliers
Business Applications
• Credit card fraud detection (spending a lot of
amount in one day or another sum amount on
another day will send a message of alert or will
block your card as it’s not related to how you
were spending previously)

• Telecommunication fraud detection

• Network intrusion detection (identifying strange

patterns in network traffic that could signal a
hack)

• Fault detection in operating environments

• System health monitoring (spotting a malignant

tumor in an MRI scan)
Anomaly Detection
• Challenges
• How many outliers are there in the data?
• Method is unsupervised
• Finding the needle in a haystack

• Working assumption:
• There are considerably more “normal” observations than “abnormal”
observations (outliers/anomalies) in the data
Types of
Anomalies/Outliers
• Univariate Outliers – Univariate outliers refer
to the data points that are significantly different
from other data points in a single variable
• Considering the temperature of the city in summer,
a temperature of around 100 degrees Celsius on a
particular day can be a Univariate outlier

• Multivariate Outliers – Multivariate outliers

refer to the data points that are significantly
different from other data points in multiple
variables
• In a class of students, the average height and
weight are 1.7 m and 50 kg, respectively. If a
student in a class has a height and weight of 4
m and 10 kg, it is considered a multivariate
outlier
Anomaly Detection Schemes
Univariate Outliers
• Z-Score Method – The Z-score method is used to identify outliers
based on their distance from the mean

• A positive Z-score means that the data point is above the mean
• A negative Z-score means that the data point is below the mean
• A Z-score close to 0 means that the data point is close to the
mean
• In general, a data point with a Z-score greater than 3 or less
than -3 is considered anomalous
• Assumption: This method requires the data to be close to
normal
Anomaly Detection Schemes
Univariate Outliers
• Interquartile Range (IQR) Method and Box-and-Whisker’s
Plot – This method is used to identify outliers based on the
distribution of the data

A data point < Q1 – 1.5* (Q3-Q1) OR

> Q3 + 1.5*(Q3-Q1) is an outlier
Case Analysis

• Case 1 - Student participation rates in the SAT (Scholastic Assessment Test)

in Connecticut school districts in 2022 are recorded. Determine the
Anomalies in the rates
Note: The SAT is a standardized test widely used for college admissions in the United
States

• Case 2 – The data consists of the number of goals scored by the top goal
scorer in every World Cup from 1930 through 2018 (21 competitions in
total). Determine the Anomalies in the scores
Anomaly Detection Schemes
Multivariate Outliers
• Isolation Forest Method – Isolation Forest is an unsupervised
machine learning algorithm for anomaly detection. This method is
built based on decision trees

• The samples that travel deeper into the tree are less likely to be
anomalies as they require more cuts to isolate them. Similarly, the
samples that end up in shorter branches indicated anomalies, as it
is easier for the tree to separate them from other observations

• Each tree in an Isolation Forest is called an Isolation Tree

Isolation Forest Method
• Isolation Forest Method Steps:

1. When given a dataset, it is assigned to a binary tree (Anomaly and No

Anomaly)
2. Branching of the tree starts by selecting a random feature (from the set of all
N features) first. Then, branching is done on a random threshold
3. If the value of a data point is less than the selected threshold, it goes to the left
branch or else to the right. Thus, a node is split into left and right branches
4. This process from step 2 is continued recursively till each data point is
completely isolated
5. The above steps are repeated to construct random binary trees
Anomaly Detection Schemes
Multivariate Outliers
• Isolation Forest Method

• When the decision tree is created, it takes fewer nodes to reach the outliers than
other normal data points
• This method directly detects anomalies using isolation (how far a data point is
from the rest of the data)
• The Isolation Forest detects anomalies by introducing binary trees that
recursively generate partitions by randomly selecting a feature and then
randomly selecting a split value for the feature. The partitioning process will
continue until it separates all the data points from the rest of the samples
Isolation Forest Method
The algorithm will start building binary decision trees by
randomly splitting the dataset

After that, the algorithm will split the data randomly and continue building the decision
tree. Let’s assume this time the split looks like this:
Isolation Forest Method
The same process of a random split will continue until all the data points are separated

The algorithm takes three splits to isolate the above point. At the same time, if the
algorithm continues the splitting process, it will take more time to isolate other points
because they are close to each other (way more iterations will be required)

The algorithm will create a random forest of such decision trees and calculate the average
number of splits to isolate each data point. The lower the number of split operations needed
to isolate a point, the more chance the data point will be an outlier
Isolation Forest Method

More no. of splits required to isolate a A data point splits in very few
data point. So, it is not an outlier iterations. So, it is an outlier
Anomaly Detection Schemes
Multivariate Outliers
• Local Outlier Factor (LOF) Algorithm – LOF is an algorithm
used for unsupervised outlier detection
• It produces an anomaly score that represents data points which are
outliers in the data set
• For each point, compute the density of its local neighbourhood
• When a point is considered as an outlier based on its local
neighbourhood, it is a local outlier. LOF will identify an outlier
considering the density of the neighbourhood
• Outliers are points with the largest LOF value
LOF Algorithm
• LOF is based on the following:
 K-distance and K-neighbours
 Reachability Distance (RD)
 Local Reachability Density (LRD)
 Local Outlier Factor (LOF)
LOF Algorithm
• K-distance is the distance between the point and its Kᵗʰ nearest
neighbour. K-neighbours denoted by Nₖ(A) include a set of points
that lie in or on the circle of radius K-distance

If K=2, K-neighbours of A will be C, B, and D. Here, the value of

K=2, and the ||N₂(A)|| = 3
LOF Algorithm
• Reachability Distance (RD)

RD(Xi, Xj) = max (K–distance(Xj), distance (Xi, Xj))

It is defined as the maximum of K-distance of Xj and the distance between Xi and Xj

In other words, if a point Xi lies within the K-neighbours of Xj, the reachability distance
will be K-distance of Xj (blue line), else reachability distance will be the distance
between Xi and Xj (orange line)
LOF Algorithm
• Local Reachability Density (LRD)

LRD is the inverse of the average reachability distance of A from its

neighbours

According to the LRD formula, the greater the average reachability

distance (i.e., neighbours are far from the point), the less density of
points is present around a particular point

This tells how far a point is from the nearest cluster of points

Low values of LRD imply that the closest cluster is far from the
point
LOF Algorithm
• Local Outlier Factor (LOF)

The LRD of each point is used to compare with the average LRD of its K
neighbours

LOF is the ratio of the average LRD of the K neighbours of A to the

LRD of A
If the point is not an outlier (inlier), the ratio of the average LRD of
neighbours is approximately equal to the LRD of a point (because the
density of a point and its neighbours are roughly equal). In that case, LOF is
nearly equal to 1. On the other hand, if the point is an outlier, the LRD of a
point is less than the average LRD of neighbours. Then LOF value will be
high
LOF Algorithm
• Local Outlier Factor (LOF)

LOF ~ 1 => Similar data point

LOF < 1 => Inlier (similar data point which is inside the density cluster)

LOF > 1 => Outlier

Case Analysis
• Credit Card Fraud Detection

• The dataset contains transactions made by credit cards in September 2023 by

European cardholders

• This dataset presents transactions that occurred in two days, where we have 492
frauds out of 2,84,807 transactions

• It contains only numerical input variables. The feature 'Time' contains the seconds
that have elapsed between each transaction and the first transaction in the dataset.
The feature 'Amount' is the transaction Amount. Feature 'Class' is the target
variable, and it takes value 1 in case of fraud and 0 otherwise

• Other features are credit card number, expiry date, CVV, cardholder name,
transaction location, transaction date-time, etc.

Optimasi Nonlinear Satu Variabel Tanpa Kendala
No ratings yet
Optimasi Nonlinear Satu Variabel Tanpa Kendala
6 pages
Feature Engineering
No ratings yet
Feature Engineering
63 pages
Control System Assignment Section B
No ratings yet
Control System Assignment Section B
14 pages
Module 11 (C)
No ratings yet
Module 11 (C)
4 pages
ARDOD: Adaptive Radius Density Based Outlier Detection: Farshad Rahmati Reza Heydari Gharaei Hossein Nezamabadi Pour
No ratings yet
ARDOD: Adaptive Radius Density Based Outlier Detection: Farshad Rahmati Reza Heydari Gharaei Hossein Nezamabadi Pour
16 pages
Outlierfin
No ratings yet
Outlierfin
19 pages
5 Anomaly Detection Annotated Section 100 300
No ratings yet
5 Anomaly Detection Annotated Section 100 300
48 pages
Lecture Notes - Anomaly Detection in Time Series
No ratings yet
Lecture Notes - Anomaly Detection in Time Series
43 pages
Outlier Analysis in Data Mining
No ratings yet
Outlier Analysis in Data Mining
5 pages
3D CNN +LTSM
No ratings yet
3D CNN +LTSM
2 pages
Outlier Detection
No ratings yet
Outlier Detection
17 pages
ADII11 Metode Deteksi Outlier
No ratings yet
ADII11 Metode Deteksi Outlier
50 pages
Cheng 2019
No ratings yet
Cheng 2019
8 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
Outlier
No ratings yet
Outlier
2 pages
Mlpack
No ratings yet
Mlpack
3 pages
Anomaly Detection
No ratings yet
Anomaly Detection
10 pages
Problem Set 3
No ratings yet
Problem Set 3
5 pages
Anomaly Detection Algorithms For RapidMiner
No ratings yet
Anomaly Detection Algorithms For RapidMiner
12 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
1 s2.0 S0020025523011052 Main
No ratings yet
1 s2.0 S0020025523011052 Main
17 pages
Anomaly or Outlier Detection
No ratings yet
Anomaly or Outlier Detection
14 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Histogram-Based Outlier Score (HBOS) : A Fast Unsupervised Anomaly Detection Algorithm
No ratings yet
Histogram-Based Outlier Score (HBOS) : A Fast Unsupervised Anomaly Detection Algorithm
5 pages
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-III
No ratings yet
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-III
4 pages
Stucor QP Ec8501
No ratings yet
Stucor QP Ec8501
21 pages
BMT401 C
No ratings yet
BMT401 C
3 pages
Quantum Computing: Exercise Sheet 4: Steven Herbert
No ratings yet
Quantum Computing: Exercise Sheet 4: Steven Herbert
2 pages
Introduction To Soft Computing Koe 046
No ratings yet
Introduction To Soft Computing Koe 046
2 pages
Experiment-1-B E-Example - 1 Date:13-10-2020: %discretizing The Interval I %finding The Values of F (X) at T Values
No ratings yet
Experiment-1-B E-Example - 1 Date:13-10-2020: %discretizing The Interval I %finding The Values of F (X) at T Values
6 pages
Walrus Optimization Algorithm
No ratings yet
Walrus Optimization Algorithm
38 pages
CS6402-Design and Analysis of Algorithms - 2013 - Regulation PDF
No ratings yet
CS6402-Design and Analysis of Algorithms - 2013 - Regulation PDF
19 pages
20 Cs 112
No ratings yet
20 Cs 112
11 pages
10 - Anomaly Detection
No ratings yet
10 - Anomaly Detection
12 pages
Travelling Salesman Problem
100% (1)
Travelling Salesman Problem
6 pages
Hopfield Network
No ratings yet
Hopfield Network
14 pages
Outliers
No ratings yet
Outliers
3 pages
Lecture-8 Outlier Detection
No ratings yet
Lecture-8 Outlier Detection
72 pages
FIR/IIR Filter With TMDX5515eZDSP Model Kit
No ratings yet
FIR/IIR Filter With TMDX5515eZDSP Model Kit
16 pages
Explanatory Data Analysis
100% (1)
Explanatory Data Analysis
28 pages
Reverse Accessible in Local Outlier Factor Density Based Recognition
No ratings yet
Reverse Accessible in Local Outlier Factor Density Based Recognition
10 pages
Data Mining Chapter 6 Anomaly & Fraud Detection
No ratings yet
Data Mining Chapter 6 Anomaly & Fraud Detection
41 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
34 pages
12 Outlier
No ratings yet
12 Outlier
16 pages
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
No ratings yet
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
12 pages
References
No ratings yet
References
6 pages
6735367a5d6e24a5f185bf9c 99512104437
No ratings yet
6735367a5d6e24a5f185bf9c 99512104437
2 pages
What Is Outlier
No ratings yet
What Is Outlier
3 pages
LOF: Identifying Density-Based Local Outliers: Markus M. Breunig, Hans-Peter Kriegel, Raymond T. NG, Jörg Sander
No ratings yet
LOF: Identifying Density-Based Local Outliers: Markus M. Breunig, Hans-Peter Kriegel, Raymond T. NG, Jörg Sander
12 pages
Outlier Analysis
No ratings yet
Outlier Analysis
28 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Unit 4
No ratings yet
Unit 4
17 pages
A Survey On Outlier Detection Methods
No ratings yet
A Survey On Outlier Detection Methods
4 pages
1 s2.0 S0952197622004936 Main
No ratings yet
1 s2.0 S0952197622004936 Main
8 pages
Advanced Data Structure - 1
No ratings yet
Advanced Data Structure - 1
16 pages
Computation of DFT
No ratings yet
Computation of DFT
13 pages
Q1. Explain DES (Data Encryption Standard) and Its Round Function
No ratings yet
Q1. Explain DES (Data Encryption Standard) and Its Round Function
7 pages
DSA Sheet by Rohit Negi
No ratings yet
DSA Sheet by Rohit Negi
38 pages
Anomaly Detection RapidMiner
No ratings yet
Anomaly Detection RapidMiner
12 pages
Handling Ouliers
No ratings yet
Handling Ouliers
5 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
Outlier Detection and Removal
No ratings yet
Outlier Detection and Removal
2 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Handling Outliers
No ratings yet
Handling Outliers
6 pages
11 Different Ways For Outlier Detection in Python
No ratings yet
11 Different Ways For Outlier Detection in Python
11 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
Image Features Using Wavelets and Applications To Document Image Processing
No ratings yet
Image Features Using Wavelets and Applications To Document Image Processing
71 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
University of Technology Department of Electrical Engineering Final Course Examination 2019-2020
No ratings yet
University of Technology Department of Electrical Engineering Final Course Examination 2019-2020
2 pages
Notes For Signals and Systems 11.5 Introduction To Laplace Transform Analysis of LTI Systems
No ratings yet
Notes For Signals and Systems 11.5 Introduction To Laplace Transform Analysis of LTI Systems
3 pages
Distance Based Outlier Detection
No ratings yet
Distance Based Outlier Detection
40 pages
Polynomial Characteristics Handout
No ratings yet
Polynomial Characteristics Handout
4 pages
Krishnendu PCB-IT602B
No ratings yet
Krishnendu PCB-IT602B
11 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
6anomaly Fraud Detection
No ratings yet
6anomaly Fraud Detection
5 pages
CA02CA3103 RMTTransportation Problem
No ratings yet
CA02CA3103 RMTTransportation Problem
25 pages
ISAT 600 Progress Report 3
No ratings yet
ISAT 600 Progress Report 3
4 pages
Answer Any Two Full Questions, Each Carries 15 Marks: F F1124 Pages: 2
No ratings yet
Answer Any Two Full Questions, Each Carries 15 Marks: F F1124 Pages: 2
2 pages
Lec 21 Trapizoidal Rule
No ratings yet
Lec 21 Trapizoidal Rule
28 pages
Methods To Detect Different Types of Outliers: March 2016
No ratings yet
Methods To Detect Different Types of Outliers: March 2016
7 pages
Outlier Detection: Univariate and Multivariate
No ratings yet
Outlier Detection: Univariate and Multivariate
13 pages
On Detection of Outliers and Their Effect in Supervised Classification
No ratings yet
On Detection of Outliers and Their Effect in Supervised Classification
14 pages
CS3002 Solution Paper 2015.16 - v2
No ratings yet
CS3002 Solution Paper 2015.16 - v2
6 pages
How To Calculate Outliers
No ratings yet
How To Calculate Outliers
7 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
28 pages
Digital Signal Processing QUESTION BANK
No ratings yet
Digital Signal Processing QUESTION BANK
5 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Edge Detection: Exploring Boundaries in Computer Vision
From Everand
Edge Detection: Exploring Boundaries in Computer Vision
Fouad Sabry
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet

Anomaly Detection

Uploaded by

Anomaly Detection

Uploaded by

Anomaly Detection

• Anomaly - The set of data points

• Anomaly detection is a technique

• Telecommunication fraud detection

• Network intrusion detection (identifying strange

• Fault detection in operating environments

• System health monitoring (spotting a malignant

• Multivariate Outliers – Multivariate outliers

A data point < Q1 – 1.5* (Q3-Q1) OR

• Case 1 - Student participation rates in the SAT (Scholastic Assessment Test)

• Each tree in an Isolation Forest is called an Isolation Tree

1. When given a dataset, it is assigned to a binary tree (Anomaly and No

If K=2, K-neighbours of A will be C, B, and D. Here, the value of

RD(Xi, Xj) = max (K–distance(Xj), distance (Xi, Xj))

It is defined as the maximum of K-distance of Xj and the distance between Xi and Xj

LRD is the inverse of the average reachability distance of A from its

According to the LRD formula, the greater the average reachability

LOF is the ratio of the average LRD of the K neighbours of A to the

LOF ~ 1 => Similar data point

LOF > 1 => Outlier

• The dataset contains transactions made by credit cards in September 2023 by

You might also like