0% found this document useful (0 votes)
54 views5 pages

6anomaly Fraud Detection

Anomaly detection techniques are used to identify unusual patterns or outliers in data. Simple statistical methods flag outliers based on standard deviations from the mean or interquartile ranges. Machine learning approaches include density-based methods that identify anomalies as points farther away from dense neighborhoods, and clustering-based methods that find outliers outside of normal clusters. These techniques have applications in fraud detection, system monitoring, and intrusion detection.

Uploaded by

Saugat Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views5 pages

6anomaly Fraud Detection

Anomaly detection techniques are used to identify unusual patterns or outliers in data. Simple statistical methods flag outliers based on standard deviations from the mean or interquartile ranges. Machine learning approaches include density-based methods that identify anomalies as points farther away from dense neighborhoods, and clustering-based methods that find outliers outside of normal clusters. These techniques have applications in fraud detection, system monitoring, and intrusion detection.

Uploaded by

Saugat Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Anomaly

/ Fraud Detection
Anomaly detection is a technique used to identify unusual patterns that do not conform to
expected behavior, called outliers.

Outlier detection (also known as anomaly detection) is the process of finding data objects with
behaviors that are very different from expectation. Such objects are called outliers or anomalies.

It has many applications in business, from intrusion detection (identifying strange patterns
in network traffic that could signal a hack) to system health monitoring (spotting a
malignant tumor in an MRI scan), and from fraud detection in credit card transactions to
fault detection in operating environments.
Anomalies can be broadly categorized as:

1. Point anomalies: A single instance of data is anomalous if it's too far off from the
rest. Business use case: Detecting credit card fraud based on "amount spent."

2. Contextual anomalies: The abnormality is context specific. This type of anomaly is


common in time-series data. Business use case: Spending $100 on food every day during
the holiday season is normal, but may be odd otherwise.
3. Collective anomalies: A set of data instances collectively helps in detecting
anomalies. Business use case: Someone is trying to copy data form a remote machine to a
local host unexpectedly, an anomaly that would be flagged as a potential cyber attack.

Anomaly Detection Techniques

i) Simple Statistical Methods

The simplest approach to identifying irregularities in data is to flag the data points that
deviate from common statistical properties of a distribution, including mean, median,
mode, and quantiles. Let's say the definition of an anomalous data point is one that deviates
by a certain standard deviation from the mean.

Algorithm:
Example: 10, 11, 15, 25, 35, 30, 7, 68

 Sort: 7, 10, 11, 15, 25, 30, 35, 68


 Find:Q1=10.5; Q3=32.5
 Find Interquartie range Q3-Q1=22
 Multiply IQR by1.5=33
 Subtract IQR from Q1 and add in Q3
 10.5-33=-22.5
 32.5+33=65.5
 Check the dataset for any data value the is smallthan Q1-1.5*IQR or larger than
Q3+1.5*IQR
68 is outlier

ii) Machine Learning-Based Approaches

Density-Based Anomaly Detection

Density-based anomaly detection is based on the k-nearest neighbor’s algorithm.

Assumption: Normal data points occur around a dense neighborhood and abnormalities are
far away.

The nearest set of data points are evaluated using a score, which could be Eucledian
distance or a similar measure dependent on the type of the data (categorical or numerical).
They could be broadly classified into two algorithms:
1. K-nearest neighbor: k-NN is a simple, non-parametric lazy learning technique used to
classify data based on similarities in distance metrics such as Eucledian, Manhattan,
Minkowski, or Hamming distance.

2. Relative density of data: This is better known as local outlier factor (LOF). This concept
is based on a distance metric called reachability distance.

LOF(k) ~ 1 means Similar density as neighbors,


LOF(k) < 1 means Higher density than neighbors (Inlier),
LOF(k) > 1 means Lower density than neighbors (Outlier)

Clustering-Based Anomaly Detection

Clustering is one of the most popular concepts in the domain of unsupervised learning.

Assumption: Data points that are similar tend to belong to similar groups or clusters, as
determined by their distance from local centroids.

K-means is a widely used clustering algorithm. It creates 'k' similar clusters of data points.
Data instances that fall outside of these groups could potentially be marked as anomalies.

You might also like