0% found this document useful (0 votes)

18 views7 pages

Module 4-2

The document discusses various clustering techniques and algorithms in data mining, including constrained-based clustering, hierarchical clustering methods, and density-based clustering concepts. It covers algorithms like BIRCH, DBSCAN, K-medoids, and the differences between CLARA and CLARANS. Additionally, it highlights the requirements for effective clustering and applications across different fields such as marketing and biology.

Uploaded by

pp6524878

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views7 pages

Module 4-2

Uploaded by

pp6524878

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Part A

Module 4 : Data Mining

2020 March
9. What do you mean by constrained based clustering?
Constrained-based clustering is a clustering technique that uses additional constraints,
such as must-link and cannot-link constraints, to guide the clustering process and improve
its accuracy.

10. What is a dendrogram?

A dendrogram is a tree-like diagram that represents the hierarchical relationships between
different clusters or objects in a dataset, based on their similarities or distances.

2021 April
9. Mention any two algorithms for hierarchical methods of
clustering.
Two algorithms for hierarchical methods of clustering are agglomerative clustering and
divisive clustering.

10. What is BIRCH?

BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is a hierarchical
clustering algorithm that uses a tree-based approach to partition data into clusters in a
memory-efficient and scalable way.

2022 April
9. What do you mean by agglomerative approach in
hierarchical clustering?
The agglomerative approach in hierarchical clustering starts by assigning each data point to
its own cluster, and then successively merges clusters based on a similarity criterion, until a
stopping criterion is met and all data points are part of a single cluster.
10. Differentiate bottom-up and top-down strategy in
hierarchical clustering.
In bottom-up or agglomerative clustering, each data point starts in its own cluster and
clusters are successively merged together. In top-down or divisive clustering, all data points
start in a single cluster, which is successively divided into smaller clusters based on a
dissimilarity criterion.
Part B
Module 4 : Data Mining

2020 March
18. Differentiate the concept of CLARA and CLARANS.
CLARA and CLARANS are both clustering algorithms, but they differ in their approach to
finding clusters in the data. CLARA stands for Clustering Large Applications, and it is a
partitioning algorithm that is based on a sample of the data rather than the entire dataset.
The sample is selected randomly, and the clustering is performed on the sample using a
partitioning algorithm such as k-means. CLARANS, on the other hand, stands for Clustering
Large Applications based on RANdomized Search, and it is a metaheuristic algorithm that
searches for clusters in a more flexible manner. CLARANS explores the solution space
using a hill-climbing technique that combines local search and randomization. CLARANS
can handle larger datasets than CLARA and is more flexible in terms of the shape and size
of clusters it can find.

19. Explain the concept of direct and indirect density

reachability.
Direct and indirect density reachability are two concepts in density-based clustering that are
used to determine whether two data points belong to the same cluster. Direct density
reachability means that two points are directly connected if they are within a certain
distance threshold of each other, and they have a sufficient density of other points around
them. Indirect density reachability, on the other hand, means that two points are
connected if there is a chain of other points that connect them, such that each point in the
chain has sufficient density. The chain of points can be of any length, and it does not need to
be a straight line. Direct density reachability is used to determine the core points of a cluster,
while indirect density reachability is used to determine the border points of a cluster. These
concepts are important in density-based clustering algorithms such as DBSCAN and
OPTICS, which rely on density reachability to find clusters in the data.

2021 April
18. Explain the contingency table for binary variables.
Contingency table is a tabular representation of two categorical variables that display the
frequency distribution of their combinations. It is commonly used to analyze the relationship
between two binary variables where both variables can only take two possible values, such
as true or false, yes or no, or 0 or 1. The table has two rows and two columns, where each
row represents one value of one variable, and each column represents one value of the
other variable. The cells in the table contain the frequencies of the combinations of the two
variables. The contingency table can be used to calculate various measures of association
between the variables, such as the chi-square statistic, the odds ratio, and the phi
coefficient.

19. Differentiate the concept of CLARA and CLARANS.

(Same answer as 18th question from 2020 March paper)

2022 April
18. Explain the applications of clustering.
Clustering is a data mining technique that aims to group similar objects into clusters based
on their similarity or distance in a high-dimensional space. Clustering has various
applications in different fields, such as marketing, biology, computer science, and social
science. In marketing, clustering can be used to segment customers into different groups
based on their purchasing behaviors or demographics, which can help to design targeted
marketing campaigns. In biology, clustering can be used to group genes or proteins with
similar functions, which can help to understand biological processes and diseases. In
computer science, clustering can be used to group similar documents or images, which can
help to organize and retrieve information efficiently. In social science, clustering can be used
to group people with similar opinions or behaviors, which can help to understand social
dynamics and trends.

19. Explain the concept of direct and indirect density

reachability.
(Same answer as 19th question from 2020 March paper)
Part C
Module 4 : Data Mining

2020 March
25. Explain the requirements for clustering.
Clustering is an unsupervised learning technique that groups similar data points together
based on their similarity or distance. To ensure that the clustering algorithm produces
accurate and meaningful results, certain requirements need to be fulfilled. The following are
some of the main requirements for clustering:

● Similarity measure: A distance metric or similarity measure must be defined to

calculate the distance or similarity between any two data points. The similarity
measure used must be appropriate for the data being clustered and should take into
account the domain-specific characteristics of the data.

● Scaling: Clustering is highly sensitive to the scale of the data. Therefore, it is

important to ensure that the data has been properly scaled to eliminate any bias
introduced by different scales of measurement.

● Noise handling: Clustering algorithms can be highly sensitive to noise, outliers, and
irrelevant data points. Therefore, it is important to identify and remove such data
points before clustering.

● Handling large datasets: Clustering algorithms can be computationally expensive

and may not be suitable for large datasets. Therefore, efficient algorithms must be
used to handle large datasets.

● Evaluation: The quality of the clustering results must be evaluated to ensure that the
results are meaningful and useful. This can be done by using various metrics such as
silhouette coefficient, Davies-Bouldin index, or purity.

By ensuring that these requirements are met, clustering algorithms can produce accurate
and meaningful results that can be used for a variety of applications such as customer
segmentation, image segmentation, and anomaly detection.

2021 April
25. Explain the concept of DBSCAN algorithm.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular
clustering algorithm that groups together data points that are closely located to each other in
high-density regions. The algorithm uses two important parameters, epsilon (ε) and minPts,
to identify clusters. Epsilon defines the maximum distance between two data points to be
considered neighbors and minPts defines the minimum number of data points required to
form a dense region.

The algorithm starts by randomly selecting a point from the dataset and finding all the
neighboring points that lie within ε distance. If the number of neighboring points is greater
than or equal to minPts, then a cluster is formed, and the process is repeated for all the
neighboring points until no more points can be added to the cluster. If the number of
neighboring points is less than minPts, then the point is considered as noise and excluded
from the cluster. The process is repeated until all the points are assigned to a cluster or
marked as noise.

DBSCAN has several advantages over other clustering algorithms such as its ability to find
clusters of arbitrary shapes and its ability to handle noise in the data. However, it requires
careful selection of the parameters ε and minPts, and its performance can be affected by the
density of the data and the dimensionality of the feature space.

2022 April
25. Explain with an example the K-medoids algorithm.
K-medoids algorithm is a clustering algorithm that aims to partition a dataset into k
clusters, where each cluster is represented by one of its data points, known as the medoid.
The algorithm is similar to K-means but is more robust to noise and outliers. K-medoids
algorithm uses a dissimilarity measure to calculate the distance between each point and its
corresponding medoid. The algorithm iteratively updates the medoids and assigns each data
point to the closest medoid until convergence.

Let's consider an example to illustrate the K-medoids algorithm. Suppose we have a dataset
of five points in a 2D space, (2,3), (3,2), (4,2), (4,4), and (5,4). We want to cluster these
points into two groups using the K-medoids algorithm. We can start by randomly selecting
two medoids from the dataset, say (2,3) and (5,4). We can then calculate the dissimilarity of
each point to each medoid using a distance metric, such as Euclidean distance.

For instance, the dissimilarity of (3,2) to (2,3) is √ ((3−2)2 +(2−3)2)=1.41 and to (5,4) is
√((3−5)2 +(2−4)2)=2.82. Similarly, we can calculate the dissimilarity of each point to the
other medoid.

After calculating the dissimilarity of each point to each medoid, we can assign each point to
the medoid that it is closest to. For instance, (2,3) and (4,2) will be assigned to the first
medoid, and (3,2), (4,4), and (5,4) will be assigned to the second medoid. We can then
calculate the sum of the dissimilarities of each point to its corresponding medoid.

In this case, the sum of the dissimilarities is 1.41+3.61+3.61+2.82+1.41=12.86 for the first
medoid and 0.0+2.24 +1.41+1.0+1.41=6.07 for the second medoid.
The algorithm then selects the point that has the lowest sum of dissimilarities to be the new
medoid for its corresponding cluster. In this case, the first medoid will be replaced by (4,2),
and the second medoid will remain the same. We can then repeat the process of assigning
each point to the closest medoid and updating the medoids until convergence. The final
result should be two clusters, (2,3), (4,2), and (3,2) in one cluster and (4,4) and (5,4) in the
other cluster.

Digital Signal Processing (BEC-42) : Unit-3 Lecture-1 (FIR Filter Design)
No ratings yet
Digital Signal Processing (BEC-42) : Unit-3 Lecture-1 (FIR Filter Design)
115 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
Applied Logistic Regression - 3rd Edition Scribd Download
100% (8)
Applied Logistic Regression - 3rd Edition Scribd Download
17 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
117 pages
Module-4 (Clustering) SREC IT
No ratings yet
Module-4 (Clustering) SREC IT
125 pages
Clustering
No ratings yet
Clustering
12 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
E-Note 28966 Content Document 20241211091351PM
No ratings yet
E-Note 28966 Content Document 20241211091351PM
69 pages
Unit 5 DWM by DR KSR Cluster Analysis
No ratings yet
Unit 5 DWM by DR KSR Cluster Analysis
72 pages
Intelligent System: Lecture Notes For Chapter 7
No ratings yet
Intelligent System: Lecture Notes For Chapter 7
25 pages
Chap8 Basic Cluster Analysis
No ratings yet
Chap8 Basic Cluster Analysis
98 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
Lect 12
No ratings yet
Lect 12
80 pages
Data Mining Unit-4
No ratings yet
Data Mining Unit-4
38 pages
Unit-V (Dmwh6em)
No ratings yet
Unit-V (Dmwh6em)
30 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Lecture Notes For Chapter 7 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 7 Introduction To Data Mining, 2 Edition
108 pages
Clustering
No ratings yet
Clustering
29 pages
Unit-3 DWDM 7TH Sem Cse
No ratings yet
Unit-3 DWDM 7TH Sem Cse
54 pages
Data Mining Modul 3 Notes
No ratings yet
Data Mining Modul 3 Notes
3 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Assignment 4
No ratings yet
Assignment 4
40 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Data Mining Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Cluster Analysis: Basic Concepts and Algorithms
26 pages
Bcs602 ML Mod-5 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-5 Notes @vtunetwork
17 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
M6
No ratings yet
M6
23 pages
Module V
No ratings yet
Module V
16 pages
ML Unit 4
No ratings yet
ML Unit 4
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
DM Unit-5 Notes
No ratings yet
DM Unit-5 Notes
16 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Oral Questions LP II
No ratings yet
Oral Questions LP II
21 pages
Oral Questions LP-II: Star Schema
No ratings yet
Oral Questions LP-II: Star Schema
21 pages
Clustering Techniquesin Data Mining
No ratings yet
Clustering Techniquesin Data Mining
7 pages
DMW Unit 5
No ratings yet
DMW Unit 5
10 pages
Chapter 2 (19-06-2019 v2)
No ratings yet
Chapter 2 (19-06-2019 v2)
10 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
Clustering
No ratings yet
Clustering
7 pages
Ktustudents - In: 1. Hierarchical Methods
No ratings yet
Ktustudents - In: 1. Hierarchical Methods
21 pages
Cluster Analysis Set 01: Types of Clustering
No ratings yet
Cluster Analysis Set 01: Types of Clustering
18 pages
Clustering Interview Questions For Data Scientists
No ratings yet
Clustering Interview Questions For Data Scientists
4 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
Data Bit
No ratings yet
Data Bit
4 pages
ML - Question Bank-1
No ratings yet
ML - Question Bank-1
7 pages
SSRN Id3768295
No ratings yet
SSRN Id3768295
7 pages
DBSCAN An Assessment of Density Based CL
No ratings yet
DBSCAN An Assessment of Density Based CL
5 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
7 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages
Unit-5 - Question Bank
No ratings yet
Unit-5 - Question Bank
5 pages
Data Mining Long Answers
No ratings yet
Data Mining Long Answers
4 pages
Department of Computer Science and Design
No ratings yet
Department of Computer Science and Design
4 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Artificial Intelligence in Power Station
No ratings yet
Artificial Intelligence in Power Station
12 pages
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
No ratings yet
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
5 pages
DSA MK Lect3 PDF
No ratings yet
DSA MK Lect3 PDF
75 pages
EC6303
No ratings yet
EC6303
5 pages
Maste503 Structural-Dynamics TH 1.0 84 Maste503
No ratings yet
Maste503 Structural-Dynamics TH 1.0 84 Maste503
2 pages
7 Neural Networks - Lecture Slides
No ratings yet
7 Neural Networks - Lecture Slides
74 pages
Basic Concept: All The Programs in This File Are Selected From
No ratings yet
Basic Concept: All The Programs in This File Are Selected From
26 pages
Goal Programming
No ratings yet
Goal Programming
12 pages
Design of A Linear Phase FIR Filter: Objective
No ratings yet
Design of A Linear Phase FIR Filter: Objective
13 pages
Cryptography Module 5
No ratings yet
Cryptography Module 5
24 pages
(Gerad 25th Anniversary 5) Guy Desaulniers, Jacques Desrosiers, Marius M. Solomon - Column Generation (Gerad 25th Anniversary Series, Volume 5) - Springer (2005) PDF
No ratings yet
(Gerad 25th Anniversary 5) Guy Desaulniers, Jacques Desrosiers, Marius M. Solomon - Column Generation (Gerad 25th Anniversary Series, Volume 5) - Springer (2005) PDF
369 pages
Unit 2 Graphics Filling
No ratings yet
Unit 2 Graphics Filling
4 pages
Principles of Discrete Time Mechanics Jaroszkiewicz G. PDF Download
100% (1)
Principles of Discrete Time Mechanics Jaroszkiewicz G. PDF Download
45 pages
ITM Chapter 5 New On Probability Distributions
No ratings yet
ITM Chapter 5 New On Probability Distributions
16 pages
Mean Centre - A Statistical Tool in Geography
No ratings yet
Mean Centre - A Statistical Tool in Geography
10 pages
DocScanner 15-Nov-2024 3-14 PM
No ratings yet
DocScanner 15-Nov-2024 3-14 PM
8 pages
Bing Qian Montecarlo
No ratings yet
Bing Qian Montecarlo
20 pages
Cbse Math PH - Ii Polynomials Solutions
No ratings yet
Cbse Math PH - Ii Polynomials Solutions
29 pages
Module 1.
No ratings yet
Module 1.
7 pages
Sri Indu College of Engineering & Technology: Email Address
No ratings yet
Sri Indu College of Engineering & Technology: Email Address
11 pages
Nonlinear Optimization With Inequality Constraints
No ratings yet
Nonlinear Optimization With Inequality Constraints
21 pages
Mod5 10
No ratings yet
Mod5 10
14 pages
Applications of Thermodynamic Models
No ratings yet
Applications of Thermodynamic Models
4 pages
Module 5-1
No ratings yet
Module 5-1
6 pages
Mod5 2
No ratings yet
Mod5 2
20 pages
Quantum Computing
No ratings yet
Quantum Computing
122 pages
Media Notes For BCA VTH Sem
No ratings yet
Media Notes For BCA VTH Sem
16 pages
Mod3 7
No ratings yet
Mod3 7
15 pages
Mod3 12
No ratings yet
Mod3 12
15 pages
Mod3 13
No ratings yet
Mod3 13
15 pages
Sample of My Work Lab Details For Lab 04
No ratings yet
Sample of My Work Lab Details For Lab 04
22 pages
RNN LectureNotes
No ratings yet
RNN LectureNotes
36 pages
Mod3 14
No ratings yet
Mod3 14
9 pages
Mod3 4
No ratings yet
Mod3 4
7 pages
Blockchain and Smart Contract For Digital Document Verification
No ratings yet
Blockchain and Smart Contract For Digital Document Verification
4 pages
It and Environment 2021 Min
No ratings yet
It and Environment 2021 Min
2 pages
Deep Learning - Brochure
No ratings yet
Deep Learning - Brochure
1 page
River 9788770220262
No ratings yet
River 9788770220262
1 page
MCMidterm SIM 2021 MCQ Questions Answers
No ratings yet
MCMidterm SIM 2021 MCQ Questions Answers
5 pages
QP CODE: 23142447: Reg No: Name
No ratings yet
QP CODE: 23142447: Reg No: Name
2 pages
Sem 3 Statistics Chapter 2.
No ratings yet
Sem 3 Statistics Chapter 2.
8 pages
Name: Aditya Parade Roll No: 281047 PRN: 22311577 Batch: A-2 Assignment 6
No ratings yet
Name: Aditya Parade Roll No: 281047 PRN: 22311577 Batch: A-2 Assignment 6
2 pages
Artificial Intelligence For Business Mb-Gab-Oimict-01 (Ahp) : The Correct Answer Is: Free From Errors
No ratings yet
Artificial Intelligence For Business Mb-Gab-Oimict-01 (Ahp) : The Correct Answer Is: Free From Errors
8 pages
Neuro-Fuzzy, Revision Questions June 1, 2005
No ratings yet
Neuro-Fuzzy, Revision Questions June 1, 2005
7 pages
Labra
No ratings yet
Labra
2 pages
Bca 5 Sem Computer Networks e 5364 Oct 2018
No ratings yet
Bca 5 Sem Computer Networks e 5364 Oct 2018
2 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Module 4-2

Uploaded by

Module 4-2

Uploaded by

Part A

Module 4 : Data Mining

10. What is a dendrogram?

10. What is BIRCH?

19. Explain the concept of direct and indirect density

19. Differentiate the concept of CLARA and CLARANS.

19. Explain the concept of direct and indirect density

● Similarity measure: A distance metric or similarity measure must be defined to

● Scaling: Clustering is highly sensitive to the scale of the data. Therefore, it is

● Handling large datasets: Clustering algorithms can be computationally expensive

You might also like