0% found this document useful (0 votes)
12 views3 pages

Ads Exp 7 - Labmanual

The document outlines an experiment for implementing outlier detection using the DBSCAN algorithm in an Applied Data Science Lab course. It details the use of the make_blobs dataset, the parameters of the DBSCAN algorithm, and the classification of points into core, border, and noise categories. The experiment utilizes Python libraries such as sklearn for DBSCAN implementation and matplotlib for visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Ads Exp 7 - Labmanual

The document outlines an experiment for implementing outlier detection using the DBSCAN algorithm in an Applied Data Science Lab course. It details the use of the make_blobs dataset, the parameters of the DBSCAN algorithm, and the classification of points into core, border, and noise categories. The experiment utilizes Python libraries such as sklearn for DBSCAN implementation and matplotlib for visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Department of Computer Engineering

Academic Year: 2024-25 Semester: VIII


Class / Branch: BE Computer Subject: Applied Data Science Lab

Experiment No. 7

1. Aim: Implement Outlier detection using density based method.

Dataset: make_blobs is used to generate data samples or dataset in the experiment.


The make_blobs function is a part of sklearn.datasets.samples_generator.
2. Software used: Google Colaboratory / Jupyter Notebook
3. Theory :- Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a base
algorithm for density-based clustering. It can discover clusters of different shapes and sizes from a
large amount of data, which is containing noise and outliers.
The DBSCAN algorithm uses two parameters:
minPts: The minimum number of points (a threshold) clustered together for a region to be
considered dense. For example, if we set the minPoints parameter as 5, then we need at least 5
points to form a dense region.
eps (ε): A distance measure that will be used to locate the points in the neighborhood of any point.
specifies how close points should be to each other to be considered a part of a cluster. It means that
if the distance between two points is lower or equal to this value (eps), these points are considered
neighbors.
There are three types of points after the DBSCAN clustering is complete:
Core — This is a point that has at least m points within distance n from itself.
Border — This is a point that has at least one Core point at a distance n.
Noise — This is a point that is neither a Core nor a Border. And it has less than m points within
distance n from itself.
The algorithm proceeds by arbitrarily picking up a point in the dataset (until all points have been visited). If
there are at least ‘minPoint’ points within a radius of ‘ε’ to the point then we consider all these points to be
part of the same cluster.The clusters are then expanded by recursively repeating the neighborhood
calculation for each neighboring point
DBSCAN algorithm can be abstracted in the following steps:
• Find all the neighbor points within eps and identify the core points or visited with more than MinPts
neighbors.
• For each core point if it is not already assigned to a cluster, create a new cluster.
• Find recursively all its density connected points and assign them to the same cluster as the core point.
• A point a and b are said to be density connected if there exist a point c which has a sufficient number of
points in its neighbors and both the points a and b are within the eps distance. This is a chaining
process. So, if b is neighbor of c, c is neighbor of d, d is neighbor of e, which in turn is neighbor of a
implies that b is neighbor of a.
• Iterate through the remaining unvisited points in the dataset. Those points that do not belong to any
cluster are noise.
4. Program:
5. Conclusion :- Python library sklearn is used to implement DBSCAN for density based outlier
detection in the experiment and the matplotlib.pyplot library is used for visualizing the clusters and
outliers.

You might also like