0% found this document useful (0 votes)

11 views51 pages

Unit 2 - Part A

This document discusses anomaly detection and dimensionality reduction. It defines anomalies as deviations from expected patterns in data. There are different types of anomalies, including point anomalies where a single data point is very different, and collective anomalies where a group of related data points together deviate from larger patterns. Anomaly detection is useful for applications like fraud detection, health monitoring, and intrusion detection to identify unexpected patterns that could signal problems. Dimensionality reduction techniques like PCA can also help detect anomalies by reducing noise and identifying underlying patterns in data.

Uploaded by

Raksheet Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views51 pages

Unit 2 - Part A

Uploaded by

Raksheet Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Outlier Detection and

Dimensionality Reduction
Unit 2
What is Anomaly?
• Various management software can be used to evaluate the
operational performance of applications and Key Performance
Indicators (KPIs) to evaluate the success of the organization
• Within the given dataset, are data patterns that represent business as
usual
• An unexpected change within these data patterns
or an event that does not conform to the expected data pattern
is considered an anomaly
• In other words, an anomaly is a deviation from business as usual
Topics to be covered….
• Introduction to anomaly (outlier) detection
• Types of anomaly detection
• Applications of Outlier detection
• Proximity based Outlier detection: distance and density based outlier
detection
• One class SVM
• Principal Component Analysis (PCA),
• Applications of PCA,
• Autoencoders: Denoising Autoencoders, Variational Autoencoders
• Applications of Autoencoders
What is Anomaly?
• It is not unusual about an e-Commerce website collecting a large
amount of revenue on specific days like festival season
because a high volume of sales during festival season
• It would be an anomaly if a company didn’t have high sales volume on
these days
especially if festival sale for previous years was very high
• It can be an anomaly if it breaks a pattern that is normal for the data
from that particular metric
• Anomalies aren’t categorically good or bad
They are deviations from the expected value for a metric at a given
point in time
Introduction
• Usually, anomalies are undetectable
by a human expert
• These items/events are called
outliers
• Anomalous data can indicate critical
incidents, such as a technical glitch,
or potential opportunities
Outliers

Outliers are often visible symptoms.

Example: Outliers
• Suppose e-commerce website has glitch in price
• It is entered $10 instead of $100 for a product.
Example: Outliers
• Deeper inspection can identify underlying error
• Customer buys multiple products
Need for Anomaly Detection
• For diseases, normal activity can be compared with diseases such as
malaria, dengue, swine-flu, etc. for which we have a cure
• SarS-CoV-2 (CoViD-19), on the other hand, is an anomaly
It shows characteristics of a normal disease with the exception of
delayed symptoms
• Had the SarS-CoV-2 anomaly been detected in its very early stage, its
spread could have been contained significantly
• Since SarS-CoV-2 is an entirely new anomaly that has never been seen
before
even a supervised learning procedure to detect this as an anomaly
would have failed
Need for Anomaly Detection
• Supervised learning model learns patterns from the features and
labels in the dataset
• By providing normal data of pre-existing diseases to an
unsupervised learning algorithm
• We could have detected this virus as an anomaly with high
probability
• since it would not have fallen into the category of normal diseases
• Therefore unsupervised learning methods are preferred over
supervised learning methods in most cases
What is time series data anomaly detection?
• Anomaly detection is based on an ability to accurately analyze time
series data in real time
• Time series data is composed of a sequence of values over time
• Each sample is typically a pair of two items
A timestamp for when the metric was measured
and the value associated with metric at that time
• Time series data is a record that contains information necessary for
making educated guesses about what can be reasonably expected in
the future
• Anomaly detection systems use those expectations to identify
actionable signals within the data
Applications
• Business
• Intrusion detection (identifying strange patterns in network traffic
that could signal a hack)
• Health monitoring (spotting a malignant tumor in an MRI scan)
• Fraud detection in credit card transactions….
Anomaly detection is not Noise detection
Anomaly detection is similar to noise removal and novelty detection
• Anomaly detection
is concerned with identifying an unobserved pattern in new observations
not included in training data
Ex: sudden interest in a new channel on YouTube during Christmas, for
instance
• Noise removal
Process of immunizing analysis from the occurrence of unwanted
observations
In other words, removing noise from an otherwise meaningful signal
Categories of Anomaly

1. Point anomaly (Global outlier)

2. Contextual anomalies
3. Collective anomalies
Point Anomaly
• Value of outlier much different from
other samples
• Business use case: Detecting credit
card fraud based on "amount spent"
• Use of zoom meeting from January
to February 2021 increased by 100%.
From February to March it increased
400%
Point Anomaly

Global economy
Contextual (Conditional) Outliers
• Its value significantly deviates
from the rest of the data points
in the same context
• Same value may not be
considered an outlier if it
occurred in a different context
• Common in time-series data

May not be an outlier in different context

Contextual (Conditional) Outliers
• Generally, for time series data, the “context” is temporal
because time series data are records of a specific quantity over time
Business use case:
Spending INR 5,000 on food every day during the holiday season is normal,
but may be odd otherwise

• In Mumbai it rains in June

• If it rains in January then it is
an outlier
Contextual (Conditional) Outliers

due to pricing glitch

Contextual (Conditional) Outliers
• Values are not outside the normal global range
• Are abnormal compared to the seasonal pattern
Collective Anomaly
• A set of data instances collectively helps in detecting anomalies
• Business use case: Someone is trying to copy data form a remote
machine to a local host unexpectedly
An anomaly that would be flagged as a potential cyber attack
• A subset of data points within a data set is considered anomalous
if those values as a collection deviate significantly from the entire
data set
• but the values of the individual data points are not themselves
anomalous in either a contextual or global sense
Collective Anomaly

• In time series data, it can be normal peaks and valleys occurring

outside of a time frame when that seasonal sequence is normal
• or as a combination of time series that is in an outlier state as a
group
Collective Anomaly

• two time series that were discovered

to be related to each other, are
combined into a single anomaly
• For each time series the individual
behavior does not deviate
significantly from the normal range,
but the combined anomaly indicated
a bigger issue
Collective Anomaly
• A group of data points in a large dataset is significantly different from
other points but each data point is not anomalous
• ex: two time series data are combined. individually they do not
deviate significantly. After combining they are considered a big issue.
Collective Anomaly
• A group of people leave neighborhood on the same day
• Generally individually leave from time to time
• It is unusual that a large group leaves neighborhood at the same time
Collective Anomaly
• Running an ad campaign.
• With increase in budget, there is an increase in number of clicks and impressions
• When they happen together there is an issue with campaign
• Individually they are not anomalous
Collective Anomaly
Collective Anomaly
Example: anomalies
• A plane landing on a highway is a global outlier
• because it’s a truly rare event that a plane would have to land there
• If the highway was congested with traffic that would be
a contextual outlier if it was happening at 3 a.m.
• when traffic doesn’t usually start until later in the morning when
people are heading to work
• And if every car on the freeway was moving to the left lane at the
same time that would be a collective outlier
• because although it’s definitely not rare that people move to the left
lane
• it is unusual that all cars would relocate at the same exact time
Example: anomalies
• A banking customer who normally deposits no more than INR1000 a
month in checks at a local ATM suddenly makes two cash deposits of
INR 5000 each in the span of two weeks is a global anomaly
• because this event has never before occurred in this customer’s
history
• The time series data of their weekly deposits would show an abrupt
recent spike
• Such a drastic change would raise alarms as these large deposits
could imply illicit commerce or money laundering
When to use time series anomaly detection?
• Depending on business model and use case anomaly detection is used for
valuable metrics such as:
• Web page views
• Daily active users
• Mobile app installs
• Cost per click
• Customer acquisition costs
• Revenue per click
• Volume of transactions
• Average order value
• And more
Simple Example of Anomaly
• N1 and N2 are regions of Y

normal behaviour N1 o1

• Points o1 and o2 are O3

anomalies
• Points in region O3 are
anomalies o2

X
Key Challenges in Anomaly Detection
• Defining a representative normal region is challenging
• The boundary between normal and outlying behaviour is often not
precise
• The exact notion of an outlier is different for different application
domains
• Availability of labelled data for training/validation
• Malicious adversaries
• Data might contain noise
• Normal behaviour keeps evolving
Aspects of Anomaly Detection Problem

• Nature of input data

• Availability of supervision
• Type of anomaly: point, contextual, structural
• Output of anomaly detection
• Evaluation of anomaly detection techniques
Input Data
• Most common form of Tid SrcIP
Start
Dest IP Dest Number
Attack
time of bytes
data handled by anomaly 1 206.135.38.95 11:07:20 160.94.179.223
Port
139 192 No
detection techniques is 2 206.163.37.95 11:13:56 160.94.179.219 139 195 No

Record Data 3 206.163.37.95 11:14:29 160.94.179.217 139 180 No

• Univariate 4 206.163.37.95 11:14:30 160.94.179.255 139 199 No

5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes

• Multivariate
6 206.163.37.95 11:14:35 160.94.179.253 139 177 No

7 206.163.37.95 11:14:36 160.94.179.252 139 172 No

8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes

9 206.163.37.95 11:14:41 160.94.179.250 139 195 No

10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes

10
Input Data – Nature of Attributes

• Nature of attributes
• Binary Tid SrcIP Duration Dest IP
Number
of bytes
Internal

• Categorical 1 206.163.37.81 0.10 160.94.179.208 150 No

• Continuous 2 206.163.37.99 0.27 160.94.179.235 208 No

• Hybrid 3 160.94.123.45 1.23 160.94.179.221 195 Yes

4 206.163.37.37 112.03 160.94.179.253 199 No

5 206.163.37.41 0.32 160.94.179.244 181 No

Data Labels
• Supervised Anomaly Detection
• Labels available for both normal data and anomalies
• Similar to rare class mining
• Semi-supervised Anomaly Detection
• Labels available only for normal data
• Unsupervised Anomaly Detection
• No labels assumed
• Based on the assumption that anomalies are very rare
compared to normal data
Output of Anomaly Detection
• Label
• Each test instance is given a normal or anomaly label
• This is especially true of classification-based approaches
• Score
• Each test instance is assigned an anomaly score
• Allows the output to be ranked
• Requires an additional threshold parameter
Taxonomy of Anomaly Detection Approaches
Anomaly Detection Point Anomaly Detection

Classification Based Nearest Neighbor Based Clustering Based Statistical Others

Rule Based Density Based Parametric Information Theory Based
Neural Networks Based Distance Based Non-parametric Spectral Decomposition Based
SVM Based Visualization Based

Contextual Anomaly Collective Anomaly Online Anomaly Distributed Anomaly

Detection Detection Detection Detection

*Outlier Detection – A Survey, Varun Chandola, Arindam Banerjee, and Vipin Kumar, Technical Report TR07-17, University of Minnesota
Taxonomy of Anomaly Detection Approaches
Anomaly Detection Point Anomaly Detection

Classification Based Nearest Neighbor Based Clustering Based Statistical Others

Rule Based Density Based Parametric Information Theory Based
Neural Networks Based Distance Based Non-parametric Spectral Decomposition Based
SVM Based Visualization Based

Contextual Anomaly Collective Anomaly Online Anomaly Distributed Anomaly

Detection Detection Detection Detection
Nearest Neighbor Based Techniques
• Key assumption: normal points have close neighbors while
anomalies are located far from other points
• General two-step approach
1.Compute neighborhood for each data record
2. Analyze the neighborhood to determine whether data record is anomaly
or not
• Categories:
• Distance based methods
• Anomalies are data points most distant from other points
• Density based methods
• Anomalies are data points in low density regions
Nearest Neighbor Based Techniques
• Advantage
• Can be used in unsupervised or semi-supervised setting (do not make any
assumptions about data distribution)
• Drawbacks
• If normal points do not have sufficient number of neighbors the techniques may fail
• Computationally expensive
• In high dimensional spaces, data is sparse and the concept of similarity may not be
meaningful anymore
Due to the sparseness, distances between any two data records may become quite
similar => Each data record may be considered as potential outlier!
Nearest Neighbor Based Techniques
• Distance based approaches
• A point O in a dataset is an DB(p, d) outlier if at least fraction p of the
points in the data set lies greater than distance d from the point O
• Density based approaches
• Compute local densities of particular regions and declare instances in
low density regions as potential anomalies
• Approaches
• Local Outlier Factor (LOF)
• Connectivity Outlier Factor (COF)
• Multi-Granularity Deviation Factor (MDEF)
Nearest Neighbor Based Techniques
• Distance based approach
• For each data point d compute the distance to the k-th nearest neighbor dk
• Sort all data points according to the distance dk
• Outliers are points that have the largest distance dk and therefore are located in the more sparse
neighborhoods
• Usually data points that have top n% distance dk are identified as outliers
• n – user parameter
• Not suitable for datasets that have modes with varying density
Nearest Neighbor Based Techniques
Density based approach
Local Outlier Factor (LOF)
• Local outlier factor (LOF) is an algorithm used for Unsupervised outlier detection.
• When a point is considered as an outlier based on its local neighborhood, it is a local
outlier.
• LOF will identify an outlier considering the density of the neighborhood.
• LOF performs well when the density of the data is not the same throughout the
dataset.
• It produces an anomaly score that represents data points which are outliers in the
data set.
• It does this by measuring the local density deviation of a given data point with respect
to the data points near it.
Local Outlier Factor (LOF)

Sequential Steps for LOF:

• K-distance and K-neighbors
• Reachability distance (RD)
• Local reachability density (LRD)
• Local Outlier Factor (LOF)
Local Outlier Factor (LOF)

Sequential Steps for LOF:

• K-distance and K-neighbors
• Reachability distance (RD)
• Local reachability density (LRD)
• Local Outlier Factor (LOF)
Local Outlier Factor (LOF)
K-DISTANCE AND K-NEIGHBORS
• K-distance is the distance between the point, and it’s Kᵗʰ nearest neighbor.
• K-neighbors denoted by Nk(A) includes a set of points that lie in or on the circle of radius K-
distance.
• K-neighbors can be more than or equal to the value of K.
• Consider four points A, B, C, and D
• If K=2, K-neighbors of A will be C, B, and D.
• Here, the value of K=2 but the ||N₂(A)|| = 3.
• Therefore, ||Nk(point)|| will always be greater than or equal to K.
Local Outlier Factor (LOF)
REACHABILITY DENSITY (RD)
• It is defined as the maximum of K-distance of Xj and the distance between Xi and Xj.
• The distance measure is problem-specific (Euclidean, Manhattan, etc.)

• If a point Xi lies within the K-neighbors of Xj, the reachability distance will
be K-distance of Xj (blue line), else reachability distance will be the
distance between Xi and Xj (orange line).
Local Outlier Factor (LOF)

LOCAL REACHABILITY DENSITY (LRD)

• Reachability distances to all of the k-nearest neighbors of a point are calculated to
determine the Local Reachability Density (LRD) of that point.
• The local reachability density is a measure of the density of k-nearest points around
a point.
• The closer the points are, the distance is lesser, and the density is more, hence the
inverse is taken in the equation.
• LRD is inverse of the average reachability distance of A from its neighbors.
• Intuitively according to LRD formula, more the average reachability distance (i.e.,
neighbors are far from the point), less density of points are present around a
particular point.
• This tells how far a point is from the nearest cluster of points.
• Low values of LRD implies that the closest cluster is far from the point.
Local Outlier Factor (LOF)
LOCAL OUTLIER FACTOR (LOF)
• LRD of each point is used to compare with the average LRD of its K
neighbors.
• LOF is the ratio of the average LRD of the K neighbors of A to the
LRD of A.
• Intuitively, if the point is not an outlier (inlier), the ratio of average
LRD of neighbors is approximately equal to the LRD of a point
(because the density of a point and its neighbors are roughly
equal).
• In that case, LOF is nearly equal to 1.
• On the other hand, if the point is an outlier, the LRD of a point is
less than the average LRD of neighbors. Then LOF value will be high.
• Generally, if LOF> 1, it is considered as an outlier, but that is not
always true.
• Let’s say we know that we only have one outlier in the data, then
we take the maximum LOF value among all the LOF values, and the
point corresponding to the maximum LOF value will be considered
as an outlier.

Additional Exercice S Data Science
No ratings yet
Additional Exercice S Data Science
3 pages
Ebook Beginners Guide To Anomaly Detection 2022
No ratings yet
Ebook Beginners Guide To Anomaly Detection 2022
12 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
Module 11 (C)
No ratings yet
Module 11 (C)
4 pages
Outlier Detection
No ratings yet
Outlier Detection
10 pages
Introtoanomalydetection 170421012904
No ratings yet
Introtoanomalydetection 170421012904
53 pages
Unit V Outlier 2
No ratings yet
Unit V Outlier 2
13 pages
WP S-Ax Key Steps To Detect An Anomaly in Real-time-JAN10
No ratings yet
WP S-Ax Key Steps To Detect An Anomaly in Real-time-JAN10
10 pages
MBA Analytics For Finance 08
No ratings yet
MBA Analytics For Finance 08
9 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
No ratings yet
The Ultimate Guide To Anomaly Detection: Key Use Cases, Techniques, and Autoencoder Machine Learning Models
9 pages
CS37300 Data Mining & Machine Learning: Anomaly Detection
No ratings yet
CS37300 Data Mining & Machine Learning: Anomaly Detection
10 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
Go L Mohammad I 2015
No ratings yet
Go L Mohammad I 2015
10 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Unit 5 - Lecture 1 - Outlier Detection
No ratings yet
Unit 5 - Lecture 1 - Outlier Detection
30 pages
5 Anomaly Detection Annotated Section 100 300
No ratings yet
5 Anomaly Detection Annotated Section 100 300
48 pages
Outlier Detection For Different Applications Review IJERTV2IS3508
No ratings yet
Outlier Detection For Different Applications Review IJERTV2IS3508
13 pages
Lecture Notes - Anomaly Detection in Time Series
No ratings yet
Lecture Notes - Anomaly Detection in Time Series
43 pages
Anomaly Detection For Cyber Security
No ratings yet
Anomaly Detection For Cyber Security
31 pages
Anomaly Detection Survey
No ratings yet
Anomaly Detection Survey
72 pages
07 Outlier Detection
No ratings yet
07 Outlier Detection
54 pages
Anomaly Detection
No ratings yet
Anomaly Detection
13 pages
Chapter 12. Outlier Analysis
No ratings yet
Chapter 12. Outlier Analysis
4 pages
Anomaly Detection
No ratings yet
Anomaly Detection
51 pages
6anomaly Fraud Detection
No ratings yet
6anomaly Fraud Detection
5 pages
A Review On Outlier Detection in Time Series Data BCAM 1 PDF
No ratings yet
A Review On Outlier Detection in Time Series Data BCAM 1 PDF
45 pages
Lecture 12 Outliers and Guidelines For Exercises
No ratings yet
Lecture 12 Outliers and Guidelines For Exercises
6 pages
What Is Anomaly Detection: MR Hew Ka Kian Hew - Ka - Kian@rp - Edu.sg
No ratings yet
What Is Anomaly Detection: MR Hew Ka Kian Hew - Ka - Kian@rp - Edu.sg
29 pages
Thermal Anomaly Detection
No ratings yet
Thermal Anomaly Detection
3 pages
ADII10 Analisa Outlier
No ratings yet
ADII10 Analisa Outlier
37 pages
SMBL Merged
No ratings yet
SMBL Merged
28 pages
Anomaly Detection Guidebook
100% (1)
Anomaly Detection Guidebook
16 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Methods To Detect Different Types of Outliers: March 2016
No ratings yet
Methods To Detect Different Types of Outliers: March 2016
7 pages
Outlier Mining Techniques For Uncertain Data
No ratings yet
Outlier Mining Techniques For Uncertain Data
7 pages
Outlier Detection
No ratings yet
Outlier Detection
45 pages
Atf ETH Master Thesis AD+RCA
No ratings yet
Atf ETH Master Thesis AD+RCA
43 pages
Anomaly Detection For Data Streams in Large-Scale Distributed Heterogeneous Computing Environments
No ratings yet
Anomaly Detection For Data Streams in Large-Scale Distributed Heterogeneous Computing Environments
11 pages
188 1496475265 - 03-06-2017 PDF
No ratings yet
188 1496475265 - 03-06-2017 PDF
6 pages
Unit 3
No ratings yet
Unit 3
37 pages
Unit5 OutliersDetection
No ratings yet
Unit5 OutliersDetection
37 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
C1a - Anomaly Detection
No ratings yet
C1a - Anomaly Detection
12 pages
ff12 Deep Learning For Anomaly Detection
No ratings yet
ff12 Deep Learning For Anomaly Detection
71 pages
Anomaly Detection
No ratings yet
Anomaly Detection
10 pages
Lec3. Outlier Analysis
No ratings yet
Lec3. Outlier Analysis
54 pages
Data Mining - Outlier Analysis
100% (3)
Data Mining - Outlier Analysis
11 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
Benkabou 2021
No ratings yet
Benkabou 2021
11 pages
Anomaly Detection: Lecture Notes For Chapter 9 Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Anomaly Detection: Lecture Notes For Chapter 9 Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
33 pages
How To Find A Unicorn: A Novel Model-Free, Unsupervised Anomaly Detection Method For Time Series
No ratings yet
How To Find A Unicorn: A Novel Model-Free, Unsupervised Anomaly Detection Method For Time Series
35 pages
Anomaly Detection: A Tutorial
No ratings yet
Anomaly Detection: A Tutorial
101 pages
Anomaly Detection
No ratings yet
Anomaly Detection
7 pages
Anomaly Detection 2
No ratings yet
Anomaly Detection 2
8 pages
741 Outlier Detection
No ratings yet
741 Outlier Detection
55 pages
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
No ratings yet
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
101 pages
Lecture 12
No ratings yet
Lecture 12
54 pages
Telecom Management for Call Centers: A Practical Guide
From Everand
Telecom Management for Call Centers: A Practical Guide
Luiz Augusto de Carvalho
No ratings yet
CUSTOMER CENTRICITY & GLOBALISATION: PROJECT MANAGEMENT: MANUFACTURING & IT SERVICES
From Everand
CUSTOMER CENTRICITY & GLOBALISATION: PROJECT MANAGEMENT: MANUFACTURING & IT SERVICES
Chandra Sekar
No ratings yet
Shelf-Confidence: A Practical Guide to Reducing Out-Of-Stocks and Improving Product Availability in Retail
From Everand
Shelf-Confidence: A Practical Guide to Reducing Out-Of-Stocks and Improving Product Availability in Retail
Thomas W. Gruen
No ratings yet
Final Lab Manual of ML BCA
No ratings yet
Final Lab Manual of ML BCA
69 pages
Project Report
No ratings yet
Project Report
5 pages
Chatbot For Disease Prediction Using Classification Based Machine Learning Algorithms
No ratings yet
Chatbot For Disease Prediction Using Classification Based Machine Learning Algorithms
5 pages
Machine Learning MCQ
100% (2)
Machine Learning MCQ
29 pages
Conceptual Design of A Natural Fibre-Reinforced Composite Automotive Anti-Roll Bar Using A Hybrid Approach
No ratings yet
Conceptual Design of A Natural Fibre-Reinforced Composite Automotive Anti-Roll Bar Using A Hybrid Approach
17 pages
Nail Disease PREDICTION
No ratings yet
Nail Disease PREDICTION
34 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
IEEE
No ratings yet
IEEE
9 pages
Eng2 12298 PDF
No ratings yet
Eng2 12298 PDF
24 pages
Dog Breed Identification: Whitney Larow Brian Mittl Vijay Singh
No ratings yet
Dog Breed Identification: Whitney Larow Brian Mittl Vijay Singh
7 pages
Honours LY Project
No ratings yet
Honours LY Project
31 pages
Assignment ML
100% (2)
Assignment ML
21 pages
Similarity Search PDF
No ratings yet
Similarity Search PDF
233 pages
Cigre A2 - 105 - 2014
No ratings yet
Cigre A2 - 105 - 2014
8 pages
Bsadcom 201910007
No ratings yet
Bsadcom 201910007
18 pages
Computers and Electronics in Agriculture: P.S. Maya Gopal, R. Bhargavi T
No ratings yet
Computers and Electronics in Agriculture: P.S. Maya Gopal, R. Bhargavi T
9 pages
003 KNN Complete
No ratings yet
003 KNN Complete
66 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
17 pages
Sharp Feature Detection in Point Clouds: Abstract-This Paper Presents A New Technique For
No ratings yet
Sharp Feature Detection in Point Clouds: Abstract-This Paper Presents A New Technique For
12 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
16 pages
Heart Disease Prediction Using KNN Algorithm-2
No ratings yet
Heart Disease Prediction Using KNN Algorithm-2
19 pages
Riadsaboundji 2020
No ratings yet
Riadsaboundji 2020
8 pages
Feed Forward Back-Propagation
No ratings yet
Feed Forward Back-Propagation
13 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Noah Silverman - Predicting The Outcome of The Horse Race Using Data Mining Technique
No ratings yet
Noah Silverman - Predicting The Outcome of The Horse Race Using Data Mining Technique
20 pages
A Practical GPU Based KNN Algorithm: Quansheng Kuang, and Lei Zhao
No ratings yet
A Practical GPU Based KNN Algorithm: Quansheng Kuang, and Lei Zhao
5 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Reading 3 Machine Learning
No ratings yet
Reading 3 Machine Learning
9 pages
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
No ratings yet
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
7 pages

Unit 2 - Part A

Uploaded by

Unit 2 - Part A

Uploaded by

Outlier Detection and

Outliers are often visible symptoms.

1. Point anomaly (Global outlier)

May not be an outlier in different context

• In Mumbai it rains in June

due to pricing glitch

• In time series data, it can be normal peaks and valleys occurring

• two time series that were discovered

• Points o1 and o2 are O3

• Nature of input data

Record Data 3 206.163.37.95 11:14:29 160.94.179.217 139 180 No

• Univariate 4 206.163.37.95 11:14:30 160.94.179.255 139 199 No

5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes

7 206.163.37.95 11:14:36 160.94.179.252 139 172 No

8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes

9 206.163.37.95 11:14:41 160.94.179.250 139 195 No

10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes

• Categorical 1 206.163.37.81 0.10 160.94.179.208 150 No

• Continuous 2 206.163.37.99 0.27 160.94.179.235 208 No

• Hybrid 3 160.94.123.45 1.23 160.94.179.221 195 Yes

4 206.163.37.37 112.03 160.94.179.253 199 No

5 206.163.37.41 0.32 160.94.179.244 181 No

Classification Based Nearest Neighbor Based Clustering Based Statistical Others

Contextual Anomaly Collective Anomaly Online Anomaly Distributed Anomaly

Classification Based Nearest Neighbor Based Clustering Based Statistical Others

Contextual Anomaly Collective Anomaly Online Anomaly Distributed Anomaly

Sequential Steps for LOF:

Sequential Steps for LOF:

LOCAL REACHABILITY DENSITY (LRD)

You might also like