0% found this document useful (0 votes)
5 views10 pages

Ads Exp 7

This document discusses anomaly and outlier detection techniques, categorizing outliers into global, contextual, and collective types. It emphasizes the importance of detecting outliers for data integrity, model accuracy, and identifying business insights, while also outlining various statistical and machine learning methods for detection. Additionally, it describes strategies for handling outliers, including removal, transformation, imputation, and separate analysis.

Uploaded by

sakshipssb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

Ads Exp 7

This document discusses anomaly and outlier detection techniques, categorizing outliers into global, contextual, and collective types. It emphasizes the importance of detecting outliers for data integrity, model accuracy, and identifying business insights, while also outlining various statistical and machine learning methods for detection. Additionally, it describes strategies for handling outliers, including removal, transformation, imputation, and separate analysis.

Uploaded by

sakshipssb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

EXPERIMENT NO.

AIM: Anomaly / Outlier detection techniques

THEORY

Outliers are data points that significantly differ from the rest of the dataset. They can
arise due to measurement errors, natural variability, or external factors. Detecting and
handling outliers is essential for ensuring data quality and improving model accuracy.

1. Types of Outliers

Outliers can be categorized into three main types:

A) Global Outliers

Global outliers, also known as point anomalies, are individual data points that deviate
significantly from the rest of the dataset.

Example:

●​ A student's test score of 10 in a class where all other scores range between 70
and 100.
●​ A house priced at $10 million in a neighborhood where the average price is
$300,000.

B) Contextual Outliers

Contextual outliers depend on the specific context of the data. A value may be an outlier
in one scenario but not in another.

Example:

●​ A temperature of 35°C in winter is an outlier, but the same temperature in


summer is normal.
●​ A monthly sales spike during a festival season, which would not be an outlier if
occurring in that period but would be unusual otherwise.
C) Collective Outliers

Collective outliers occur when a group of data points deviates from the expected
pattern, even though individual points may not appear anomalous.

Example:

●​ A sudden drop in website traffic for a week due to a server failure.


●​ A group of fraudulent transactions made by multiple accounts over a short
period.

2. Importance of Outlier Detection

Outlier detection is crucial because:

A) Ensuring Data Integrity

Outliers can arise due to errors in data collection or recording. Identifying and correcting
these errors improves data reliability.

B) Improving Model Accuracy

Machine learning models can be highly sensitive to outliers. Removing or handling


outliers prevents skewed predictions and improves model performance.

C) Identifying Business Insights

Outliers often represent unusual events that may have business significance, such as
sudden demand surges, equipment failures, or fraudulent activities.

D) Preventing Fraudulent Attacks

Detecting anomalous transactions in financial data can help identify fraudulent activities
and prevent financial losses.
3. Methods of Detecting Outliers

Several statistical and machine learning techniques can be used for outlier detection:

A) Statistical Methods

1.​ Z-Score Analysis: Identifies data points that are a certain number of standard
deviations away from the mean.
2.​ Interquartile Range (IQR): Detects outliers using the 1.5 * IQR rule beyond the
first and third quartiles.
3.​ Box Plot Analysis: Visual representation of data distribution highlighting potential
outliers.

B) Machine Learning Approaches

1.​ Isolation Forest: Detects outliers by randomly partitioning the dataset and
identifying points that require fewer splits.
2.​ DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Identifies
dense regions and marks low-density points as outliers.
3.​ Autoencoders: Neural network-based method that reconstructs normal data well
but struggles with anomalies.

4. Handling Outliers

Once detected, outliers can be managed through different techniques:

A) Removing Outliers

●​ Suitable when the outlier is due to errors or irrelevant noise.


●​ Example: Incorrect sensor readings in industrial monitoring.

B) Transforming Data

●​ Applying log transformations or normalization to reduce the impact of extreme


values.
●​ Example: Converting salary data into log scale for regression models.

C) Imputation

●​ Replacing outliers with the median, mean, or nearest neighbor values.


●​ Example: Replacing extreme temperatures with the average of neighboring days.

D) Treating Separately

●​ In cases where outliers represent rare but important occurrences (e.g., fraud
detection), they should be analyzed separately rather than removed.
CONCLUSION

In this experiment we learn about outlier detection.

Outlier detection is a fundamental step in data analysis that ensures data quality and
improves decision-making.

You might also like