0% found this document useful (0 votes)
17 views13 pages

Unit V Outlier 2

,......side ..........adjustments full......side ..........adjustments full...,............,.........,...,...........,............,.........,...,...............side ..........adjustments full...,............,.........,...,...............side ..........adjustments full...,............,.........,...,...............side ..........adjustments full...,............,.........,...,..........
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views13 pages

Unit V Outlier 2

,......side ..........adjustments full......side ..........adjustments full...,............,.........,...,...........,............,.........,...,...............side ..........adjustments full...,............,.........,...,...............side ..........adjustments full...,............,.........,...,...............side ..........adjustments full...,............,.........,...,..........
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Outlier

Analysis
Identifying
Outliers

Assume that a given statistical process is generating a set of data objects.


An outlier is a data object that deviates significantly from the rest of the
objects, as if it were generated by a different mechanism
Outliers are different from noisy data.
Noise is a random error or variance in a measured variable. In general, noise
is not interesting in data analysis
Identifying
Outliers

Ex:- customer generating noise data in purchase, in a restaurant R


etc.
Noise should be removed before outlier detection
Outliers are interesting because they are suspected of not being
generated by the same mechanisms as the rest of the data.
Therefore, in outlier detection, it is important to justify why the
outliers detected are generated by some other mechanisms.
This is often achieved by making various assumptions on the rest of
the data and showing that the outliers detected violate those
assumptions significantly.
Outliers Types:-

Global outliers

Contextual Outliers

Collective Outliers
Outliers Types:-

Global outliers

◦ If it deviates significantly from the rest of the data set.


◦ Sometimes called point anomalies, and are the simplest type of outliers.
Outliers Types:-

Examples:-
◦ Global outlier detection is important in many applications. Consider intrusion detection
in computer networks, for example.

◦ If the communication behavior of a computer is very different from the normal patterns
(e.g., a large number of packages is broadcast in a short time), this behavior may be
considered as a global outlier and the corresponding computer is a suspected victim of
hacking.

◦ In trading transaction auditing systems, transactions that do not follow the regulations
are considered as global outliers and should be held for further examination.
Outliers Types:-

Contextual Outliers

◦ If it deviates significantly with respect to a specific context of the object


◦ Also called conditional outliers because they are conditional on the selected
context

◦ Ex:- “The temperature today is 28 degree Celsius. Is it exceptional?

◦ Depends on the context –the date, location, and other factors


In a given data set, a data object is a contextual outlier if it deviates significantly with
respect to a specific context of the object. Contextual outliers are also known as conditional
outliers because they are conditional on the selected context.

Therefore, in contextual outlier detection, the context has to be specified as part of the
problem definition.

Generally, in contextual outlier detection, the attributes of the data objects in question are
divided into two groups:
◦ Contextual attributes: The contextual attributes of a data object define the object’s context.
In the temperature example, the contextual attributes may be date and location.

◦ Behavioral attributes: These define the object’s characteristics, and are used to evaluate
whether the object is an outlier in the context to which it belongs. In the temperature
example, the behavioral attributes may be the temperature, humidity, and pressure.
Identifying Outliers- Collective Outliers

◦ Given a data set, a subset of data objects forms a collective outlier, if the objects as a
whole deviate significantly from the entire data set.

◦ Importantly, the individual data objects may not be outliers.

Example:- in an intrusion detection, a denial-of-service package from one node to


another is normal. But if several nodes keep sending denial-of-service packages to each
other, they as a whole should be considered as a collective outlier.

Unlike other outliers, in collective outlier detection, we have to consider not only the
behavior of individual objects, but also that of groups of objects.
Outlier detection (also known as anomaly detection) is the process of finding data
objects with behaviors that are very different from expectation.
Such objects are called outliers or anomalies.
Outlier detection is important in many applications in addition to fraud detection
such as medical care, public safety and security, industry damage detection, image
processing, sensor/video network surveillance, and intrusion detection.
* Challenges of Outlier Detection *

Modeling normal objects and outliers effectively

Application-specific outlier detection

Handling noise in outlier detection

Understandability
End

You might also like