0% found this document useful (0 votes)
5 views

Machine Learning for Anomaly Detection

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Machine Learning for Anomaly Detection

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Machine learning for anomaly

detection
December 2024
1. Understanding techniques, applications, and best practices
Agenda

2. Case studies

3. Points to remember

4. Resources and further reading

5. Questions and discussion


01
UNDERSTANDING TECHNIQUES, APPLICATIONS, AND
BEST PRACTICES
Artificial Intelligence vs Machine Learning

AI vs ML?

Artificial intelligence (AI) is a Machine learning (ML) is a


broad concept that describes specific application of AI that
a machine's ability to mimic teaches machines to perform
human intelligence. tasks by learning from data.
WHAT IS MACHINE LEARNING?

Machine Learning Overview

• Machine Learning is a subset of AI that .


enables systems to learn and improve from
experience without explicit programming.

• Key Focus Patterns, predictions, and decision-


making
Process
WHAT IS ANOMALY DETECTION?

Anomaly detection refers to


identifying patterns in data that do
not conform to expected behavior.

Significant in applications like


fraud detection, network security,
and predictive maintenance.

Helps mitigate risks and improve


decision- making processes.

Anomaly detection identifies suspicious activity that falls outside of your established normal
patterns of behavior. A solution protects your system in real-time from instances that could result
in significant financial losses, data breaches, and other harmful events
TYPES OF ANOMALIES

Point Anomalies
Data points significantly
different from the majority (e.g., Contextual Anomalies
a sudden spike in network
traffic). Unusual only within a specific
context (e.g., high temperature
during winter).

Collective Anomalies
A collection of related data
points that deviate as a group
(e.g., a distributed denial- of-
service attack).
SUPERVISED ANOMALY DETECTION UNSUPERVISED ANOMALY DETECTION
• Supervised machine learning builds a • Unsupervised methods do not demand
predictive model using a labeled training manual labeling of training data. Instead,
set with normal and anomalous samples they operate based on the presumption

• The most common supervised methods • The most popular unsupervised anomaly
include Bayesian networks, k-nearest detection algorithms include Autoencoders,
neighbors, decision trees, supervised neural K-means, GMMs, hypothesis tests-based
networks, and SVMs analysis, and PCAs.

• The advantage of supervised models is that • These techniques thus assume collections
they may offer a higher rate of detection of frequent, similar instances are normal
and flag infrequent data groups as
malicious.

SEMI SUPERVISED ANOMALY DETECTION


• Semi-supervised anomaly detection may refer to an approach to creating a model for normal data
based on a data set that contains both normal and anomalous data, but is unlabelled

• The most common semi supervised methods include Linear regression, Outlier detection,Graph-
based.

• A semi-supervised anomaly detection algorithm might also work with a data set that is partially
flagged. It will then build a classification algorithm on just that flagged subset of data, and use that
model to predict the status of the remaining data.
WHY USE MACHINE LEARNING FOR ANOMALY DETECTION?

Advantages of ML Challenges

• Data imbalance Anomalies are


• Handles complex and rare compared to normal data
large datasets effectively.
• . Learns from data to • Dynamic and non- stationary
adapt to new patterns data.Data evolves over time,
dynamically requiring adaptive models

• Provides superior • High dimensionality Complex


accuracy compared to data structures make anomalies
traditional statistical harder to detect
methods.
COMMON ALGORITHMS IN ANOMALY DETECTION

Algorithm Types Anomaly Detection Algorithm Techniques To Know

• Supervised Random Forest, SVM for • Isolation Forest


binary classification. • Local Outlier Factor (LOF)
• Unsupervised: PCA, k- Means, • Robust Covariance
Isolation Forest for detecting • One- class support vector machine
patterns. (SVM)
• Deep Learning: Autoencoders, RNNs • One- class SVM with stochastic
for complex data types like time gradient descent (SGD)
series. • K- means clustering
• Long short- term memory (LSTM)
• Angle- based outlier detection
Techniques

One-Class Support Vector


Isolation Forest Local Outlier Factor Robust Covariance
Machine (SVM)
Isolation Forest isolates LOF identifies anomalies by Robust covariance is a statistical A One-Class SVM creates a
anomalies by creating random comparing the local density of a method that computes the boundary around normal data
partitions in the data. Anomalies point to its neighbors. Points with covariance matrix to identify points in a high-dimensional
are isolated faster than normal significantly lower density than data points deviating from the space, classifying points outside
points due to their distinct their neighbors are flagged as multivariate distribution. the boundary as anomalies.
properties. outliers .

Long Short-Term Memory Angle-Based Outlier


One-Class SVM with SGD K-Means Clustering
(LSTM) Detection
This method optimizes One- K-Means groups data into LSTMs are a type of recurrent This method calculates the
Class SVM using Stochastic clusters, and points far from any neural network that learns angle between points in high-
Gradient Descent to handle cluster center are considered temporal dependencies in dimensional space to detect
large-scale datasets efficiently. anomalies. sequential data. They identify anomalies. Anomalies are
. anomalies by analyzing identified based on deviations
deviations from learned patterns. from expected angular
distributions.
One-Class Support Vector
Isolation Forest Local Outlier Factor(LOF) Machine (SVM)

Long Short-Term Memory (LSTM)


K-Means Clustering
EXAMPLES OF ALGORITHM APPLICATIONS

One-Class Support Vector Machine


Isolation Forest Example Local Outlier Factor (LOF) Example Robust Covariance Example (SVM) Example
Detecting fraudulent Identifying unusual behavior in Detecting unusual patterns in Detecting abnormal network
transactions in credit card data user activity logs for multivariate sensor data in traffic in IT infrastructure.
using an Isolation Forest cybersecurity. manufacturing processes.
algorithm.

Long Short-Term Memory (LSTM) Angle-Based Outlier Detection


One-Class SVM with SGD Example K-Means Clustering Example Example Example
Detecting outliers in massive Identifying rare diseases in Detecting anomalies in time- Detecting outliers in large, high-
customer behavior datasets in patient medical records by series data, such as server logs dimensional datasets like gene
e- commerce. analyzing cluster distances. or stock market fluctuations. expression data.
INFERENCE

Key Inference

Anomaly detection techniques are vital for


uncovering irregularities in various domains.
Choosing the right algorithm depends on the

1. Dataset
2. Scale,
3. Application requirements.
PRACTICAL WORKFLOW FOR ANOMALY DETECTION

Step 1 Data preprocessing: Handle


missing data, outliers, and normalization.

Step 2Algorithm selection based on the


data and problem type.

Workflow
Step 3 Model evaluation using key metrics
like F1- score.

Step 4 Deploy the model and monitor its


performance.
02
CASE STUDIES
03 Resources and Further Reading

1. Books: 'Anomaly Detection Principles and Algorithms' by Aggarwal.


2. Courses: 'Machine Learning for All’, AL/ML at IIT
3. Datasets: UCI Machine Learning Repository
Q&A
Thank you

You might also like