Anomaly Detection
Anomaly Detection
DETECTION
- Prof Madhurima Paul
Financial transactions
Data is the most precious commodity in data science, and anomalies are
the most disruptive threats to its quality. Bad data quality means bad:
• Statistical tests
• Dashboards
• Machine learning models
• Decisions
Types of Anomalies
• Anomaly detection encompasses two broad practices: outlier
detection and novelty detection.
Example
Now, imagine that the city installs a new, more accurate weather
monitoring station. As a result, the dataset starts consistently recording
slightly higher temperatures, ranging from 25°C to 35°C. This
sustained increase in temperatures is a novelty, representing a new
pattern introduced by the improved monitoring system.
Types of Outliers
As there are two types of anomalies, there are two types of outliers as
well: univariate and multivariate. Depending on the type, we will use
different detection algorithms.
2. Interquartile range (IQR): The IQR is the range between the first
quartile (Q1) and the third quartile (Q3) of a distribution. When an
instance is beyond Q1 or Q3 for some multiplier of IQR, they are
considered outliers. The most common multiplier is 1.5, making the
outlier range [Q1–1.5 * IQR, Q3 + 1.5 * IQR].
2.Local Outlier Factor (LOF): LOF measures the local density deviation of
a sample compared to its neighbours. Points with significantly lower
density are chosen as outliers.