Anomaly or Outlier Detection
Anomaly or Outlier Detection
Anomaly/Outlier Detection
Challenges
– How many outliers are there in the data?
– Method is unsupervised
Validation can be quite challenging (just like for clustering)
– Finding needle in a haystack
Working assumption:
– There are considerably more “normal” observations
than “abnormal” observations (outliers/anomalies) in
the data
Anomaly Detection Schemes
General Steps
– Build a profile of the “normal” behavior
Profile can be patterns or summary statistics for the overall population
– Use the “normal” profile to detect anomalies
Anomalies are observations whose characteristics
differ significantly from the normal profile
Limitations
– Time consuming
– Subjective
Convex Hull Method
Approach:
– Compute the distance between every pair of data
points
In the NN approach, p2 is
not considered as outlier,
while LOF approach find
both p1 and p2 as outliers
p2
p1
Clustering-Based
Basic idea:
– Cluster the data into
groups of different density
– Choose points in small
cluster as candidate
outliers
– Compute the distance
between candidate points
and non-candidate
clusters.
Ifcandidate points are far
from all other non-
candidate points, they are
outliers