0% found this document useful (0 votes)
14 views5 pages

Esam - DWM Lab 8

The document discusses implementing the DBSCAN clustering algorithm to group similar data points without labels. It explains DBSCAN's use of core points and density, and the importance of its Eps and MinPts parameters. Code samples in Python generate sample data and perform DBSCAN clustering. A RapidMiner process is also designed to cluster a larger 1000 point dataset using DBSCAN.

Uploaded by

NarutoBoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

Esam - DWM Lab 8

The document discusses implementing the DBSCAN clustering algorithm to group similar data points without labels. It explains DBSCAN's use of core points and density, and the importance of its Eps and MinPts parameters. Code samples in Python generate sample data and perform DBSCAN clustering. A RapidMiner process is also designed to cluster a larger 1000 point dataset using DBSCAN.

Uploaded by

NarutoBoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Name: Esam Ashfaq Date: 21-04-2024

PRN: 21070122049

Practical No: 8
___________________________________________________________________________
Title:
Implement DBSCAN data mining algorithm using both Python and DM tool
(RapidMiner)
___________________________________________________________________________
Objective:
Students will learn and implement:

• DBSCAN data mining algorithm


___________________________________________________________________________
Description:
Clustering:

Clustering algorithms are a core component of machine learning, grouping similar data points based on their
proximity or similarity within a dataset without needing pre-existing labels or guided instruction. These
algorithms uncover inherent patterns, structures, or relationships within data across various applications like
image recognition, customer segmentation, anomaly detection, and recommendation systems.

DBSCAN:

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) specifically identifies clusters as dense
regions in the data space, distinct from areas of lower density which represent noise. The core principle of
DBSCAN revolves around defining clusters and noise within the dataset. Each point in a cluster should be in
close proximity to a minimum number of neighboring points, encapsulated by a specified neighborhood radius.

To effectively implement DBSCAN, two crucial parameters must be considered:

• Eps (ε): This parameter defines the radius around a data point within which other points are
considered its neighbors. Points within this radius are classified as neighbors if the distance between
them is less than or equal to ε. Selecting an appropriate ε is critical; a small value might classify too
much data as noise, whereas a large value could merge distinct clusters, consolidating a majority of
data points into a single cluster. Determining ε can be facilitated by methods such as analyzing the k-
distance graph.

• MinPts: This parameter specifies the minimum number of neighbors (data points) within the ε radius
required to define a core point. The choice of MinPts is influenced by the dataset's size, with larger
datasets necessitating higher values of MinPts. As a general guideline, MinPts should be at least 3, and
for larger datasets, it should be greater than or equal to the number of dimensions (D) in the dataset
plus one.

K-means clustering is effective for automatically identifying and grouping similar data points, making it valuable
for exploratory data analysis and uncovering underlying data structures. Its centroid-based approach offers a
straightforward and computationally efficient method for clustering large datasets.
___________________________________________________________________________
Program code (Python):
Dataset-

Code-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN

X, _ = make_blobs(n_samples=500, centers=3, n_features=2, random_state=20)

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], s=50, cmap='viridis')
plt.title("Generated Data Points")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
epsilon = 1
min_samples = 5
dbscan = DBSCAN(eps=epsilon, min_samples=min_samples)
clusters = dbscan.fit_predict(X)

plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis', s=50)
plt.title("DBSCAN Clustering Result")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.scatter(X[clusters == -1, 0], X[clusters == -1, 1], c='red', marker='x', s=100, label='Noise
points')
plt.colorbar(label='Cluster')
plt.legend()
plt.show()
Input and Output:
___________________________________________________________________________
Model Design (RapidMiner):
Dataset-

(1000 Data Points)

Design-

Input and Output:


___________________________________________________________________________
Conclusion:
Thus, we have implemented DBSCAN.
___________________________________________________________________________

You might also like