Unit 2 - Part A
Unit 2 - Part A
Dimensionality Reduction
Unit 2
What is Anomaly?
• Various management software can be used to evaluate the
operational performance of applications and Key Performance
Indicators (KPIs) to evaluate the success of the organization
• Within the given dataset, are data patterns that represent business as
usual
• An unexpected change within these data patterns
or an event that does not conform to the expected data pattern
is considered an anomaly
• In other words, an anomaly is a deviation from business as usual
Topics to be covered….
• Introduction to anomaly (outlier) detection
• Types of anomaly detection
• Applications of Outlier detection
• Proximity based Outlier detection: distance and density based outlier
detection
• One class SVM
• Principal Component Analysis (PCA),
• Applications of PCA,
• Autoencoders: Denoising Autoencoders, Variational Autoencoders
• Applications of Autoencoders
What is Anomaly?
• It is not unusual about an e-Commerce website collecting a large
amount of revenue on specific days like festival season
because a high volume of sales during festival season
• It would be an anomaly if a company didn’t have high sales volume on
these days
especially if festival sale for previous years was very high
• It can be an anomaly if it breaks a pattern that is normal for the data
from that particular metric
• Anomalies aren’t categorically good or bad
They are deviations from the expected value for a metric at a given
point in time
Introduction
• Usually, anomalies are undetectable
by a human expert
• These items/events are called
outliers
• Anomalous data can indicate critical
incidents, such as a technical glitch,
or potential opportunities
Outliers
Global economy
Contextual (Conditional) Outliers
• Its value significantly deviates
from the rest of the data points
in the same context
• Same value may not be
considered an outlier if it
occurred in a different context
• Common in time-series data
normal behaviour N1 o1
anomalies
• Points in region O3 are
anomalies o2
N2
X
Key Challenges in Anomaly Detection
• Defining a representative normal region is challenging
• The boundary between normal and outlying behaviour is often not
precise
• The exact notion of an outlier is different for different application
domains
• Availability of labelled data for training/validation
• Malicious adversaries
• Data might contain noise
• Normal behaviour keeps evolving
Aspects of Anomaly Detection Problem
• Nature of attributes
• Binary Tid SrcIP Duration Dest IP
Number
of bytes
Internal
*Outlier Detection – A Survey, Varun Chandola, Arindam Banerjee, and Vipin Kumar, Technical Report TR07-17, University of Minnesota
Taxonomy of Anomaly Detection Approaches
Anomaly Detection Point Anomaly Detection
• If a point Xi lies within the K-neighbors of Xj, the reachability distance will
be K-distance of Xj (blue line), else reachability distance will be the
distance between Xi and Xj (orange line).
Local Outlier Factor (LOF)