CH 03 - 11 - Unsupervised Learning - Anomaly Detection
CH 03 - 11 - Unsupervised Learning - Anomaly Detection
1 / 25
2 / 25
Course Content
3/ 13
/ 13
4 / 25
Introduction
5 / 25
Applications
6 / 25
DBSCAN
8 / 25
q The algorithm works by picking an arbitrary point
to start with.
q It then finds all points with distance eps or less
from that point.
q If there are less than min_samples points within
distance eps of the starting point, this point is
labeled as noise, meaning that it doesn’t belong to
any cluster.
q If there are more than min_samples points within a
distance of eps, the point is labeled a core sample
and assigned a new cluster label.
9 / 25
12 / 25
eps small mean many points labeled Increasing min_samples means that
as noise. eps very large result many fewer points will be core points, and
points forming a single cluster. more points will be labeled as noise.
14 / 25
15 / 25
Classification
16 / 25
Point Anomaly Detection
q scikit-learn estimators
Ø KernelDensity
Ø OneClassSVM
Ø IsolationForest
Ø LocalOutlierFactor
Source: https://coderzcolumn.com/tutorials/machine-learning/scikit-learn-sklearn-anomaly-detection-outliers-detection
17 / 25
Make_blobs
18 / 25
KernelDensity
19 / 25
20 / 25
Dividing Dataset into Valid Samples and Outliers
21 / 25
22 / 25
Plot Outliers with Valid Samples for Comparison
23 / 25
OneClassSVM
24 / 25
Predict Sample Class (Outlier vs Normal)
25 / 25
27 / 25