Data Mining Modul 3 Notes

The document discusses the requirements for clustering, including defining a similarity measure, scaling the data, handling noise and outliers, handling large datasets, and evaluating clustering results. Meeting these requirements can produce accurate and meaningful clustering results for applications such as customer segmentation and anomaly detection.

Uploaded by

tempmail281103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views3 pages

Data Mining Modul 3 Notes

Uploaded by

tempmail281103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Part C

Module 4 : Data Mining

2020 March
25. Explain the requirements for clustering.
Clustering is an unsupervised learning technique that groups similar data points together
based on their similarity or distance. To ensure that the clustering algorithm produces
accurate and meaningful results, certain requirements need to be fulfilled. The following are
some of the main requirements for clustering:

● Similarity measure: A distance metric or similarity measure must be defined to

calculate the distance or similarity between any two data points. The similarity
measure used must be appropriate for the data being clustered and should take into
account the domain-specific characteristics of the data.

● Scaling: Clustering is highly sensitive to the scale of the data. Therefore, it is

important to ensure that the data has been properly scaled to eliminate any bias
introduced by different scales of measurement.

● Noise handling: Clustering algorithms can be highly sensitive to noise, outliers, and
irrelevant data points. Therefore, it is important to identify and remove such data
points before clustering.

● Handling large datasets: Clustering algorithms can be computationally expensive

and may not be suitable for large datasets. Therefore, efficient algorithms must be
used to handle large datasets.

● Evaluation: The quality of the clustering results must be evaluated to ensure that the
results are meaningful and useful. This can be done by using various metrics such as
silhouette coefficient, Davies-Bouldin index, or purity.

By ensuring that these requirements are met, clustering algorithms can produce accurate
and meaningful results that can be used for a variety of applications such as customer
segmentation, image segmentation, and anomaly detection.

2021 April
25. Explain the concept of DBSCAN algorithm.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular
clustering algorithm that groups together data points that are closely located to each other in
high-density regions. The algorithm uses two important parameters, epsilon (ε) and minPts,
to identify clusters. Epsilon defines the maximum distance between two data points to be
considered neighbors and minPts defines the minimum number of data points required to
form a dense region.

The algorithm starts by randomly selecting a point from the dataset and finding all the
neighboring points that lie within ε distance. If the number of neighboring points is greater
than or equal to minPts, then a cluster is formed, and the process is repeated for all the
neighboring points until no more points can be added to the cluster. If the number of
neighboring points is less than minPts, then the point is considered as noise and excluded
from the cluster. The process is repeated until all the points are assigned to a cluster or
marked as noise.

DBSCAN has several advantages over other clustering algorithms such as its ability to find
clusters of arbitrary shapes and its ability to handle noise in the data. However, it requires
careful selection of the parameters ε and minPts, and its performance can be affected by the
density of the data and the dimensionality of the feature space.

2022 April
25. Explain with an example the K-medoids algorithm.
K-medoids algorithm is a clustering algorithm that aims to partition a dataset into k
clusters, where each cluster is represented by one of its data points, known as the medoid.
The algorithm is similar to K-means but is more robust to noise and outliers. K-medoids
algorithm uses a dissimilarity measure to calculate the distance between each point and its
corresponding medoid. The algorithm iteratively updates the medoids and assigns each data
point to the closest medoid until convergence.

Let's consider an example to illustrate the K-medoids algorithm. Suppose we have a dataset
of five points in a 2D space, (2,3), (3,2), (4,2), (4,4), and (5,4). We want to cluster these
points into two groups using the K-medoids algorithm. We can start by randomly selecting
two medoids from the dataset, say (2,3) and (5,4). We can then calculate the dissimilarity of
each point to each medoid using a distance metric, such as Euclidean distance.

√
For instance, the dissimilarity of (3,2) to (2,3) is ((3−2)2 +(2−3)2)=1.41 and to (5,4) is
√((3−5) +(2−4 ) )=2.82. Similarly, we can calculate the dissimilarity of each point to the
2 2

other medoid.

After calculating the dissimilarity of each point to each medoid, we can assign each point to
the medoid that it is closest to. For instance, (2,3) and (4,2) will be assigned to the first
medoid, and (3,2), (4,4), and (5,4) will be assigned to the second medoid. We can then
calculate the sum of the dissimilarities of each point to its corresponding medoid.

In this case, the sum of the dissimilarities is 1.41+3.61+3.61+2.82+1.41=12.86 for the first
medoid and 0.0+2.24 +1.41+1.0+1.41=6.07 for the second medoid.
The algorithm then selects the point that has the lowest sum of dissimilarities to be the new
medoid for its corresponding cluster. In this case, the first medoid will be replaced by (4,2),
and the second medoid will remain the same. We can then repeat the process of assigning
each point to the closest medoid and updating the medoids until convergence. The final
result should be two clusters, (2,3), (4,2), and (3,2) in one cluster and (4,4) and (5,4) in the
other cluster.

Designing A Roller Coaster
100% (6)
Designing A Roller Coaster
18 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
Eir December 2019
No ratings yet
Eir December 2019
1,937 pages
Density Based CA
No ratings yet
Density Based CA
8 pages
Aiml Ece Unit-4
No ratings yet
Aiml Ece Unit-4
130 pages
Unit 4
No ratings yet
Unit 4
29 pages
Komatsu Avance Loader WA470 3 Wheel Loader Operating Maintenance Manual
0% (1)
Komatsu Avance Loader WA470 3 Wheel Loader Operating Maintenance Manual
235 pages
Tender - GCMS Specification
No ratings yet
Tender - GCMS Specification
5 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
18 pages
5 PDF
No ratings yet
5 PDF
1 page
K Medoids
No ratings yet
K Medoids
101 pages
ML Unit 4
No ratings yet
ML Unit 4
15 pages
Chapter 7
No ratings yet
Chapter 7
29 pages
Spring 2023 INT 500 - Syllabus (Marketing - Sales)
No ratings yet
Spring 2023 INT 500 - Syllabus (Marketing - Sales)
22 pages
Cluster Analysis: Basic Concepts and Methods: 10.1 Exercises
No ratings yet
Cluster Analysis: Basic Concepts and Methods: 10.1 Exercises
16 pages
Chemistry Practicals Notes
No ratings yet
Chemistry Practicals Notes
30 pages
Clustering
No ratings yet
Clustering
12 pages
PH Formative Assessment Training - Final Report - FINAL
No ratings yet
PH Formative Assessment Training - Final Report - FINAL
50 pages
G2 3 1 2HowBearLostHisTail5
No ratings yet
G2 3 1 2HowBearLostHisTail5
15 pages
Chapter 3 2
No ratings yet
Chapter 3 2
27 pages
Ca-3 QB (Pec-It602b) - 2024-1
No ratings yet
Ca-3 QB (Pec-It602b) - 2024-1
12 pages
2.10 Partitioning Methods - K-Means and K-Medoids
No ratings yet
2.10 Partitioning Methods - K-Means and K-Medoids
38 pages
CV Unit 4
No ratings yet
CV Unit 4
60 pages
HTML & SQL Programmes
No ratings yet
HTML & SQL Programmes
4 pages
Unit 5 - Systems of Equations and Inequalities Study Guide
No ratings yet
Unit 5 - Systems of Equations and Inequalities Study Guide
6 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
K Medoids
No ratings yet
K Medoids
9 pages
How To Build Data Pipelines For Machine Learning - by Shaw Talebi - Towards Data Science
No ratings yet
How To Build Data Pipelines For Machine Learning - by Shaw Talebi - Towards Data Science
21 pages
KMeans Variants
No ratings yet
KMeans Variants
27 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
DMBI5
No ratings yet
DMBI5
9 pages
Data Bit
No ratings yet
Data Bit
4 pages
Waves - Label
100% (1)
Waves - Label
2 pages
DM Chapter 5 (Clustering)
No ratings yet
DM Chapter 5 (Clustering)
40 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
Clustering
No ratings yet
Clustering
24 pages
After The Storm
No ratings yet
After The Storm
4 pages
Chapter 2 (19-06-2019 v2)
No ratings yet
Chapter 2 (19-06-2019 v2)
10 pages
Module 4-2
No ratings yet
Module 4-2
7 pages
Dmbi Iat-2 Imp Ques Soln
No ratings yet
Dmbi Iat-2 Imp Ques Soln
43 pages
EB L1300U Datasheet
No ratings yet
EB L1300U Datasheet
3 pages
Tata Motors
No ratings yet
Tata Motors
38 pages
PART2
No ratings yet
PART2
61 pages
Clustering Algorithm: A Fundamental Operation in Data Mining
No ratings yet
Clustering Algorithm: A Fundamental Operation in Data Mining
44 pages
7 Cluster Analysis
No ratings yet
7 Cluster Analysis
62 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Introduction To Integrated Library Systems: Lesson 5. How Do You Implement An Integrated Library System?
No ratings yet
Introduction To Integrated Library Systems: Lesson 5. How Do You Implement An Integrated Library System?
19 pages
Lect 12
No ratings yet
Lect 12
80 pages
008 Clustering With Examples - Unlocked
No ratings yet
008 Clustering With Examples - Unlocked
6 pages
Storage Technologies: Digital Assignment 1
No ratings yet
Storage Technologies: Digital Assignment 1
16 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
M5
No ratings yet
M5
40 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
Data Mining
No ratings yet
Data Mining
3 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
A Comparative Study of K-Means, DBSCAN and OPTICS
No ratings yet
A Comparative Study of K-Means, DBSCAN and OPTICS
6 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Manila Standard Today - Friday (December 14, 2012) Issue
No ratings yet
Manila Standard Today - Friday (December 14, 2012) Issue
26 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Koskela en Es
No ratings yet
Koskela en Es
298 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Module 5
No ratings yet
Module 5
98 pages
Henry Cavill
No ratings yet
Henry Cavill
2 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
EE 432/532 Diffusion Examples - 1
No ratings yet
EE 432/532 Diffusion Examples - 1
13 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Memorandum: Rivergate Place, Murrarie, QLD Hope Harbour Marina, QLD +1300 052 081
No ratings yet
Memorandum: Rivergate Place, Murrarie, QLD Hope Harbour Marina, QLD +1300 052 081
3 pages
DBSCAN
No ratings yet
DBSCAN
7 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
DBSCAN
No ratings yet
DBSCAN
3 pages
Enhancing DBSCAN Algorithm For Data Mining
No ratings yet
Enhancing DBSCAN Algorithm For Data Mining
5 pages
Syllabus MBA542 Fall 2020
No ratings yet
Syllabus MBA542 Fall 2020
3 pages
Ip Streaming: Appear TV As
No ratings yet
Ip Streaming: Appear TV As
12 pages
Chemical Catalog
No ratings yet
Chemical Catalog
58 pages
NMI
No ratings yet
NMI
36 pages
Circlet For Edinburgh
100% (1)
Circlet For Edinburgh
2 pages
Lect 4
No ratings yet
Lect 4
34 pages
ATO Tutorials
100% (1)
ATO Tutorials
36 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Cluster
100% (1)
Cluster
72 pages
HOA314N: Activity 2: Vernacular Houses
No ratings yet
HOA314N: Activity 2: Vernacular Houses
8 pages
Anila 8611
No ratings yet
Anila 8611
18 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Data Mining Modul 3 Notes

Uploaded by

Data Mining Modul 3 Notes

Uploaded by

Part C

Module 4 : Data Mining

● Similarity measure: A distance metric or similarity measure must be defined to

● Scaling: Clustering is highly sensitive to the scale of the data. Therefore, it is

● Handling large datasets: Clustering algorithms can be computationally expensive

You might also like