0% found this document useful (0 votes)

10 views

Anomaly Detection

The document discusses anomaly detection algorithms which analyze unlabeled data to find unusual data points. It describes density estimation techniques using Gaussian distributions to model data and detect anomalies based on a probability cutoff. The text also covers evaluating algorithm performance on labeled validation data and comparing anomaly detection to supervised learning approaches.

Uploaded by

Rutvik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Anomaly Detection

Uploaded by

Rutvik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Anomaly Detection

Anomaly detection algorithms look at an unlabeled data set and determine unusual
or abnormal data points or events. Common use cases include applications where
unusual activity is a problem, such as fraud detection, quality analysis in
manufacturing, and monitoring computers in data centers.

Density Estimation
The most common way to carry out anomaly detection is with an algorithm called
density estimation. Given a dataset {x1 , x2 , … , xm }, the density estimation
algorithm computes a model p(x), which is the probability of x being seen in the
dataset.

If this were to be graphed, boundaries

form around the data, corresponding to
various probabilities. As the data move
closer to the “center” of the data, it has
a higher probability. Likewise, as it
moves further away, its probability
decreases.

In order to detect an anomaly, a cutoff point must be set, which is represented by ϵ.

This is usually a smaller number, depending on the application. New data is fed into
the model and compared to this value. Then an anomaly (or not) is applied to the
new training examples. This relationship is represented as,

anomaly = p(xtest ) < ϵ

Gaussian (Normal) Distribution

In order to find an optimal probability cutoff ϵ and
apply anomaly detection, gaussian or normal
distribution is used. This is commonly referred to as a
bell curve. Given a value x, the probability of x is

Anomaly Detection 1
determined by a Gaussian with mean μ and variance
σ 2 . Variance is derived from the standard deviation,
which is σ . Since probabilities always sum up to 1,
the area under the curve is also equal to 1.

The equation for the probability as a function of x is as follows,

1 −(x−μ)2
p(x) = e 2σ 2
2πσ

To compute the mean, the values are simply averaged,

m
1
μ= ∑ x(i)
m
i=1

Since the standard deviation is simply the difference between the data points and
the mean, the variance is this value squared,

m
1
2
σ = ∑(x(i) − μ)2
m
i=1

Defining the Algorithm

Taking the density estimation and Gaussian distribution, a more formal definition
for the anomaly detection algorithm is made:

Given a training set: {x(1) , x(2) , … , x(m) }, where each example x(i) has n
features,

⎡x ⎤ ⎡ x1 ⎤
(1)

x(2) x2
X= x=
⋮
⎣xn ⎦
⋮
⎣x(n) ⎦

The model for probability is defined as follows, where the mean μ and variance
σ 2 are parameters to the probability function,

2 2 2

Anomaly Detection 2
p(x) = p(x1 ; μ1 , σ12 ) ∗ p(x2 ; μ2 , σ22 ) ∗ ⋯ ∗ p(xn ; μn , σn2 )

A more compact way to write this function is below,

n
p(x) = ∏ p(xj ; μj , σj2 )
j=1

Putting it all together

1. Choose n features xi that might be indicative of anomalous examples.

2. Fit parameters μ1 , … , μn and μ2n , … , σn2 :

m
1
∑ xj
(i)
μj =
m i=1

m
1
∑(xj − μj )2
(i)
σj2 =
m i=1

Vectorized formula
m
1
μ= ∑ x(i)
m i=1

⎡ μ1 ⎤
μ2
μ=
⋮
⎣μn ⎦

3. Given new example x, compute p(x):

n n
1 −(xj − μj )2
p(x) = ∏ p(xj ; μj , σj2 ) =∏ exp( )
j=1 j=1
2πσj 2σj2

4. Determine an anomaly if p(x) <ϵ

Evaluating the Model’s Performance

Anomaly Detection 3
When developing any learning algorithm (choosing features, etc.) making
decisions is much easier if there is a way to evaluate the algorithm. There is a
technique to produce a single number or value, which is used as an evaluation
metric to gauge performance. This process is called real-number evaluation.

In the case of anomaly detection, there is one needed step, which is to assume
there is some labeled data of anomalous y = 1 and non-anomalous (normal)
examples. The initial training set should contain all anomalous data (y = 1), but
if a few examples are not that is fine. Then cross validation and test sets are
created. Ideally, both of these sets should contain mostly normal examples and a
few anomalous examples.

y={
1 if p(x) < ϵ (anomaly)
0 if p(x) ≥ ϵ (normal)

Cross validation set Test set

(x(1) (1) (mcv ) (mcv )

cv , ycv ), … , (xcv , ycv ) (1) (1) (m ) (m )
(xtest , ytest ), … , (xtesttest , ytesttest )

After the algorithm has been trained on the training set, the cross validation set is
used to fine to the probability boundary ϵ. Epsilon can be fine tuned by looking at
what data was incorrectly labeled. Fine tuning features xj is also an options
here. Once all parameters have been optimized, the model can be implemented
on the test set for a final verdict.

Alternatively, there are times when it is best to not use a test set and only a cross
validation set; for example, if training data contains very few labeled anomalous
examples or the data set itself is small. The downside is the model cannot test
the parameters after being fine tuned, so there is a higher risk of overfitting.

Possible evaluation metrics

Just like other classification problems, there are alternative metrics to help
increase the accuracy of the model. These metrics are particularly useful for very
skewed data and were previously mentioned in detail in Practical Machine
Learning tips.

True, positive, false positive, false negative, true negative

Precision/Recall

Anomaly Detection 4
F1-score

Regardless of what approach is used, the key is to find the best value for the
probability boundary ϵ.

Comparing to Supervised Learning

Since anomaly detection requires labeled data for the cross validation and test
sets (at the very least), wouldn’t it make more sense to use a supervised learning
algorithm? Well, this can sometimes be the case, and the choice between the two
is often subtle.

Anomaly detection Supervised learning

The first criteria lies in the data itself; specifically with how many of the examples
are positive or negative.

Works the best when the dataset Works well when the dataset has
has a very small number of a large number of both positive
positive examples (y = 1) (0-20 and negative examples; or, at the
is common) and a large number of least, it’s not as skewed as an
negative (y = 0) examples. anomaly detection dataset.

When future anomalies occur, especially ones that look nothing like any of the
anomalous examples seen so far, the two approaches can be quite different.

Since this data contains a large Enough positive examples exist

number of negative examples, it is for the algorithm to get a sense of
hard for the algorithm to learn what they are like, so future
from positive examples what positive examples are likely to be
anomalies look like. Therefore, it similar to ones in the training set.
is more likely to flag many Therefore, it is harder to catch
different types of future anomalies different types of anomalies the
it hasn’t seen before. model hasn’t seen before.

A use case to illustrate this difference is manufacturing. Anomaly detection is

more useful for finding new, previously unseen defects, while supervised learning
is more useful for finding known, previously seen defects.

Anomaly Detection 5
Fine Tuning Features
With supervised learning, if features are not right or relevant to the problem, that
is ok because the algorithm intuitively figures out what features to ignore, rescale,
etc. However, for anomaly detection, which runs with unlabeled data, it is harder
for the algorithm to account for these potential problems. Thus, carefully
choosing features is especially important.

Transforming features
One important step to enforcing efficient features is to make sure they are
Gaussian. In other words, the data for that feature should closely resemble a
symmetric bell curve. There are several tricks to converting non-gaussian
features, one of which is directly manipulating the data. For example, given
feature data x, one could take log(x), ex , x , or any other mathematical
transformation for that matter.

Error analysis
After modeling the initial training data, if it does not test perform well on the cross
validation set, there is also the option of carrying out error analysis. More simply,
this involves looking at errors and trying to reason why they are behaving that
way.

This usually starts with evaluating the probability function p(x). Remember, the
two conditions for an optimal model are:

1. p(x) is large for normal examples x. p(x) ≥ ϵ

2. p(x) is small for anomalous examples p(x) < ϵ

Anomaly Detection 6
x.

The most common problem regarding error analysis

is:

p(x) is comparable for normal and anomalous

examples (usually a large value for both).

When an anomalous example has a large value for

p(x), it can often go undetected and will not get
flagged by the algorithm.

Deriving a new feature x2 from the problem feature

x1 can help solve this problem. The values in the
data do not necessarily have to be directly derived,
but the two features should be very closely related.

When these two data features are plotted against

each other, it can make the anomalies stand out
much more clearly.

For example, consider a fraud detection algorithm, where the original feature is
the number of transactions, and the additional feature is typing speed (which
would directly related to getting out fast transactions).

Existing features can also be combined into new features. Moreover,

mathematical transformations can be applied too. For example, a new feature x3
for a data center algorithm could be defined as,

(CPU load)2 (x1 )2

x3 = =
network traffic x2

Anomaly Detection 7

(Terrorism, Security, and Computation) Kishan G. Mehrotra, Chilukuri K. Mohan, HuaMing Huang (Auth.) - Anomaly Detection Principles and Algorithms-Springer International Publishing (2017)
No ratings yet
(Terrorism, Security, and Computation) Kishan G. Mehrotra, Chilukuri K. Mohan, HuaMing Huang (Auth.) - Anomaly Detection Principles and Algorithms-Springer International Publishing (2017)
229 pages
Week 9 Lecture Notes
No ratings yet
Week 9 Lecture Notes
7 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages
Anomaly Detection - Problem Motivation
No ratings yet
Anomaly Detection - Problem Motivation
9 pages
10 - Anomaly Detection
No ratings yet
10 - Anomaly Detection
12 pages
Slide 11 - Anomaly Detection PDF
No ratings yet
Slide 11 - Anomaly Detection PDF
31 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
Ex 8
No ratings yet
Ex 8
15 pages
Anamoly Detection
No ratings yet
Anamoly Detection
20 pages
ff12-deep-learning-for-anomaly-detection
No ratings yet
ff12-deep-learning-for-anomaly-detection
71 pages
Module_11(c)
No ratings yet
Module_11(c)
4 pages
6anomaly Fraud Detection
No ratings yet
6anomaly Fraud Detection
5 pages
Anomaly Detection 2
No ratings yet
Anomaly Detection 2
8 pages
Docs Slides Lecture15
No ratings yet
Docs Slides Lecture15
37 pages
12 Anomaly Detection SVD III
No ratings yet
12 Anomaly Detection SVD III
25 pages
Aam Micro
No ratings yet
Aam Micro
13 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
Anomaly Detection
No ratings yet
Anomaly Detection
49 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
XV. Anomaly Detection
0% (1)
XV. Anomaly Detection
4 pages
Outlier Detection
No ratings yet
Outlier Detection
36 pages
Anomaly Detection: Jing Gao
No ratings yet
Anomaly Detection: Jing Gao
51 pages
Ebook Beginners Guide To Anomaly Detection 2022
No ratings yet
Ebook Beginners Guide To Anomaly Detection 2022
12 pages
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
No ratings yet
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
101 pages
Introtoanomalydetection 170421012904
No ratings yet
Introtoanomalydetection 170421012904
53 pages
Machine Learning for Anomaly Detection
No ratings yet
Machine Learning for Anomaly Detection
23 pages
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
No ratings yet
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
9 pages
MBA Analytics For Finance 08
No ratings yet
MBA Analytics For Finance 08
9 pages
UNIT 4
No ratings yet
UNIT 4
17 pages
Anomaly Detection: A Tutorial
No ratings yet
Anomaly Detection: A Tutorial
101 pages
Machine Learning - 1
No ratings yet
Machine Learning - 1
24 pages
Anomaly Detection Guidebook
100% (1)
Anomaly Detection Guidebook
16 pages
Chapter 6 7 Anomaly Fraud Detection Advanced Datamining Application
No ratings yet
Chapter 6 7 Anomaly Fraud Detection Advanced Datamining Application
10 pages
T6_QMchange-point-anomaly
No ratings yet
T6_QMchange-point-anomaly
11 pages
Data Mining Chapter 6 Anomaly & Fraud Detection
No ratings yet
Data Mining Chapter 6 Anomaly & Fraud Detection
41 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Anomaly Detection
No ratings yet
Anomaly Detection
19 pages
Single Pass Anomaly Detection
No ratings yet
Single Pass Anomaly Detection
14 pages
Unit 2 - Part A
No ratings yet
Unit 2 - Part A
51 pages
02 - 03 - Anomaly Detection Survey
No ratings yet
02 - 03 - Anomaly Detection Survey
27 pages
WSDM21 Tutorial DLAD Slides
No ratings yet
WSDM21 Tutorial DLAD Slides
110 pages
05graph Anomaly
No ratings yet
05graph Anomaly
136 pages
Phát hiện bất thường sử dụng kỹ thuật bao lồi
No ratings yet
Phát hiện bất thường sử dụng kỹ thuật bao lồi
19 pages
Quantum-Inspired Anomaly Detection, A QUBO Formulation
No ratings yet
Quantum-Inspired Anomaly Detection, A QUBO Formulation
8 pages
Anomaly Detection Survey
No ratings yet
Anomaly Detection Survey
72 pages
Anomaly detection for Cyber security
No ratings yet
Anomaly detection for Cyber security
31 pages
C3 W1 Anomaly Detection
No ratings yet
C3 W1 Anomaly Detection
14 pages
Graph Diffusion Models for Anomaly Detection
No ratings yet
Graph Diffusion Models for Anomaly Detection
6 pages
Evaluation Metrics For Anomaly Detection Algorithm
No ratings yet
Evaluation Metrics For Anomaly Detection Algorithm
18 pages
CS L06 MachineLearning AnomalyDetection
No ratings yet
CS L06 MachineLearning AnomalyDetection
61 pages
5.1.1 Objective and Scope: Jyenis 2020
No ratings yet
5.1.1 Objective and Scope: Jyenis 2020
8 pages
Data Mining Slide Contents
No ratings yet
Data Mining Slide Contents
22 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
11 Different Ways For Outlier Detection in Python
No ratings yet
11 Different Ways For Outlier Detection in Python
11 pages
Anomalies in dataset
No ratings yet
Anomalies in dataset
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
7-8
No ratings yet
7-8
56 pages
CAPM
No ratings yet
CAPM
2 pages
Article_Summary
No ratings yet
Article_Summary
8 pages
ccd_summary (2)
No ratings yet
ccd_summary (2)
2 pages
coke_summary
No ratings yet
coke_summary
2 pages
nokia
No ratings yet
nokia
2 pages
Casing Well
100% (1)
Casing Well
26 pages
VWC-2000 Brochure 4U EN-1
No ratings yet
VWC-2000 Brochure 4U EN-1
4 pages
2024 Heffernan Units 3 4 Exam 2 Solutions
No ratings yet
2024 Heffernan Units 3 4 Exam 2 Solutions
12 pages
AI Case Study About (ChatGPT)
No ratings yet
AI Case Study About (ChatGPT)
11 pages
Paper Presentation
No ratings yet
Paper Presentation
3 pages
Spare Parts Catalog: MS-B 3025 Material Number: 4475.004.095 Current Date: 10.07.2017
No ratings yet
Spare Parts Catalog: MS-B 3025 Material Number: 4475.004.095 Current Date: 10.07.2017
25 pages
Mark Scheme Sample Assessment Material - Spring 2011: IGCSE ICT (4IT0/01)
No ratings yet
Mark Scheme Sample Assessment Material - Spring 2011: IGCSE ICT (4IT0/01)
17 pages
Issue Oil Testing Guidelines
No ratings yet
Issue Oil Testing Guidelines
6 pages
Entrepreneurship Module 4
No ratings yet
Entrepreneurship Module 4
19 pages
SolaNOVA 65 Selfcontained
No ratings yet
SolaNOVA 65 Selfcontained
2 pages
Insecure Data Storage
No ratings yet
Insecure Data Storage
8 pages
Excavation and Lateral Support Tekla
No ratings yet
Excavation and Lateral Support Tekla
9 pages
Performance Qualification Protocol For Fluid Bed Dryer
100% (3)
Performance Qualification Protocol For Fluid Bed Dryer
15 pages
Electrical Machines Laboratory 20190720 - LR
No ratings yet
Electrical Machines Laboratory 20190720 - LR
48 pages
Peninsular 65 Diesel Manual
No ratings yet
Peninsular 65 Diesel Manual
58 pages
Drama Lesson Plan For Grade 11 & 12 - GD Goenka
100% (2)
Drama Lesson Plan For Grade 11 & 12 - GD Goenka
3 pages
2408.12959v1 Multimodal ICL
No ratings yet
2408.12959v1 Multimodal ICL
12 pages
Key Elements of Workforce Compensation in Oracle HCM Cloud
No ratings yet
Key Elements of Workforce Compensation in Oracle HCM Cloud
6 pages
GIZ-Ref. PO 91122062
No ratings yet
GIZ-Ref. PO 91122062
1 page
Performance Analysis of Vision Transformer Based Architecture For Cursive Handwritten Text Recognition
No ratings yet
Performance Analysis of Vision Transformer Based Architecture For Cursive Handwritten Text Recognition
6 pages
Mmcflash Vehicle Reprogramming
No ratings yet
Mmcflash Vehicle Reprogramming
11 pages
Eboni Mitchell Resume 7
No ratings yet
Eboni Mitchell Resume 7
4 pages
Lab Experiment 2
No ratings yet
Lab Experiment 2
5 pages
C Class Notes
No ratings yet
C Class Notes
16 pages
Lec 12
No ratings yet
Lec 12
34 pages
Matlab Notes
100% (1)
Matlab Notes
152 pages
Assignment On Classes and Objects
No ratings yet
Assignment On Classes and Objects
2 pages
Proff. Pinakeswar Mahanta-CV - Prof. P. Mahanta
No ratings yet
Proff. Pinakeswar Mahanta-CV - Prof. P. Mahanta
27 pages
Fakrul Islam Tushar CV 2020
No ratings yet
Fakrul Islam Tushar CV 2020
6 pages
NS32 - Session 3 - Handouts - 2 Per
No ratings yet
NS32 - Session 3 - Handouts - 2 Per
74 pages