0% found this document useful (0 votes)
9 views

Anomaly Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Anomaly Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

ANOMALY

DETECTION
- Prof Madhurima Paul
Financial transactions

Normal: Routine purchases and consistent spending by an


individual in London.

Outlier: A massive withdrawal from Ireland from the same


account, hinting at potential fraud.
Network traffic in cybersecurity

Normal: Regular communication, steady data transfer, and adherence to


protocol.

Outlier: Abrupt increase in data transfer or use of unknown protocols


signaling a potential breach or malware.
Patient vital signs
monitoring

Normal: Stable heart rate and consistent blood pressure

Outlier: Sudden increase in heart rate and decrease in blood pressure,


indicating a potential emergency or equipment failure.
The Importance of Anomaly Detection in Data Science

Data is the most precious commodity in data science, and anomalies are
the most disruptive threats to its quality. Bad data quality means bad:

• Statistical tests
• Dashboards
• Machine learning models
• Decisions
Types of Anomalies
• Anomaly detection encompasses two broad practices: outlier
detection and novelty detection.

• Identifying the type of anomalies is crucial as it allows you to choose the


right algorithm to detect them.

Example
Now, imagine that the city installs a new, more accurate weather
monitoring station. As a result, the dataset starts consistently recording
slightly higher temperatures, ranging from 25°C to 35°C. This
sustained increase in temperatures is a novelty, representing a new
pattern introduced by the improved monitoring system.
Types of Outliers

As there are two types of anomalies, there are two types of outliers as
well: univariate and multivariate. Depending on the type, we will use
different detection algorithms.

1. Univariate outliers exist in a single variable or feature in isolation.


Univariate outliers are extreme or abnormal values that deviate from the
typical range of values for that specific feature.
2. Multivariate outliers are found by combining the values of
multiple variables at the same time.
Anomaly Detection Methods
For univariate outlier detection, the most popular methods are:

1. Z-score (standard score): the z-score measures how many standard


deviations a data point is away from the mean. Generally, instances
with a z-score over 3 are chosen as outliers.

2. Interquartile range (IQR): The IQR is the range between the first
quartile (Q1) and the third quartile (Q3) of a distribution. When an
instance is beyond Q1 or Q3 for some multiplier of IQR, they are
considered outliers. The most common multiplier is 1.5, making the
outlier range [Q1–1.5 * IQR, Q3 + 1.5 * IQR].

3. Modified z-scores: similar to z-scores, but modified z-scores use the


median and a measure called Median Absolute Deviation (MAD) to find
outliers. Since mean and standard deviation are easily skewed by
outliers, modified z-scores are generally considered more robust.
For multivariate outliers, we generally use machine learning algorithms.
Because of their depth and strength, they are able to find intricate patterns
in complex datasets:

1.Isolation Forest: uses a collection of isolation trees (similar to decision


trees) that recursively divide complex datasets until each instance is
isolated. The instances that get isolated the quickest are considered
outliers.

2.Local Outlier Factor (LOF): LOF measures the local density deviation of
a sample compared to its neighbours. Points with significantly lower
density are chosen as outliers.

3.Clustering techniques: techniques such as k-means or hierarchical


clustering divide the dataset into groups. Points that don’t belong to any
group or are in their own little clusters are considered outliers.

4. Angle-based Outlier Detection (ABOD): ABOD measures the

You might also like