0% found this document useful (0 votes)

37 views52 pages

Partition

The document discusses various clustering methods, including partition-based methods like K-Means, K-Medoids, and CLARANS, which focus on dividing datasets into distinct groups. It also covers density-based methods such as DBSCAN and OPTICS, as well as hierarchical clustering techniques, including Agglomerative and Divisive clustering, highlighting their algorithms, advantages, and disadvantages. Additionally, it emphasizes the importance of choosing appropriate parameters and the visualization of clustering results through dendrograms.

Uploaded by

gautamchandan25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views52 pages

Partition

Uploaded by

gautamchandan25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 52

Partition-Based Clustering Methods

Partitioning clustering methods divide a dataset into distinct groups (clusters) such that data
points in the same group are more similar to each other than to those in different groups. The
goal is often to minimize some criterion, like the sum of squared errors (SSE). Here are three
common methods:

1. K-Means Clustering

 Concept: K-means partitions data into kkk clusters, where each cluster is represented by
its centroid (the mean of the points in the cluster).
 Algorithm:
1. Choose kkk initial centroids randomly.
2. Assign each data point to the nearest centroid.
3. Recalculate centroids as the mean of all points assigned to them.
4. Repeat steps 2-3 until centroids no longer change significantly.
 Criterion: Minimizes the sum of squared distances between points and their assigned
cluster centroid.

2. K-Medoids Clustering

 Concept: Similar to k-means, but instead of centroids (mean values), it selects actual data
points (medoids) to represent clusters.
 Algorithm:
1. Initialize kkk medoids (representative points from the dataset).
2. Assign each point to the closest medoid.
3. Swap medoids with non-medoid points to see if the clustering improves (lower
total dissimilarity).
4. Repeat until there are no more beneficial swaps.
 Advantage: More robust to outliers compared to k-means because it uses real data points.

3. CLARANS (Clustering Large Applications based on RANdomized Search)

 Concept: An improved version of k-medoids that optimizes medoid selection using a

randomized search.
 Algorithm:
1. Start with an initial set of medoids.
2. Randomly select a subset of medoid candidates instead of evaluating all possible
swaps.
3. Accept a swap if it improves clustering.
4. Repeat until no significant improvement is found.
 Advantage: More scalable than k-medoids for large datasets.
Problem 2:
2.
CLARA (Kaufmann and Rousseeuw in 1990)

Built in statistical analysis packages, such as S+

It draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as
the output

Strength: deals with larger data sets than PAM

Weakness:

Efficiency depends on the sample size

A good clustering based on samples will not necessarily represent a good clustering of the whole data
set if the sample is biased

Use the k-means algorithm and Euclidean distance to cluster the following 8 examples into 3 clusters:
A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).

Suppose that the initial seeds (centers of each cluster) are A1, A4 and A7. Run the k-means algorithm for
1 epoch only. At the end of this epoch show:
a) The new clusters (i.e. the examples belonging to each cluster)
b) The centers of the new clusters
c) Draw a 10 by 10 space with all the 8 points and show the clusters after the first epoch and the new
centroids.
d) How many more iterations are needed to converge? Draw the result for each epoch
Density based clustering algorithm

Density-based clustering groups data based on areas of high density, separating out low-density areas as
noise or outliers. These methods are particularly good for discovering clusters of arbitrary shape and
handling noise.

📌 Common Methods:

✅ DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

 Key idea: Clusters are formed from regions of high density separated by regions of low density.

 Parameters:

o ε (epsilon): radius of neighborhood around a point.

o minPts: minimum number of points required to form a dense region.

 Steps:

1. Label each point as core, border, or noise.

2. Connect core points within ε of each other.

3. Expand clusters from core points.

4. Border points are assigned to the nearest core cluster; noise is discarded.

✅ OPTICS (Ordering Points To Identify the Clustering Structure)

 Extension of DBSCAN that handles varying density better.

 Produces a reachability plot rather than explicit clusters; clusters can be extracted later.

📈 Advantages:

 Can find clusters of arbitrary shape.

 Handles noise well.

 No need to specify number of clusters (for DBSCAN).

⚠️Disadvantages:

 Choosing optimal ε and minPts can be tricky.

 Struggles with clusters of varying densities (DBSCAN).

 Not efficient for high-dimensional data.

Hierarchical Clustering Methods in Detail

Hierarchical clustering is a method of clustering that creates a hierarchy of clusters in the form of a tree
structure called a dendrogram. Unlike K-means clustering, hierarchical clustering does not require
specifying the number of clusters beforehand.

Hierarchical clustering can be divided into two main types:

1. Agglomerative Hierarchical Clustering (AHC) – Bottom-up approach

2. Divisive Hierarchical Clustering – Top-down approach

Let’s explore both in detail:

1️⃣ Agglomerative Hierarchical Clustering (AHC) – Bottom-Up Approach

Agglomerative clustering starts with each data point as its own cluster and merges the most similar
clusters at each step until only one cluster remains.

🔹 Steps of Agglomerative Clustering:

1. Start with each data point as its own cluster.

2. Compute distances (or similarity) between all clusters.

3. Merge the two closest clusters.

4. Repeat steps 2–3 until all points are in one cluster or the desired number of clusters is reached.

5. Dendrogram Analysis: The hierarchical structure can be visualized using a dendrogram, where
we can cut at different levels to get different numbers of clusters.

🔹 Linkage Criteria (How to Measure Distance Between Clusters?)

To decide which clusters to merge, different linkage methods can be used:

Linkage Type Description

Single Linkage Distance between the closest (nearest) points of two clusters.

Complete Linkage Distance between the farthest points of two clusters.

Average Linkage Average of all pairwise distances between points in two clusters.

Centroid Linkage Distance between the centroids (mean points) of two clusters.

Ward’s Method Minimizes the variance within each cluster to form compact groups.

Example:
Consider five points in 2D space. Using single linkage, the two closest points merge first, and this
process continues iteratively.

🔹 Advantages of Agglomerative Clustering

✔️No need to predefine the number of clusters.
✔️Can handle non-spherical clusters better than K-means.
✔️Produces a dendrogram for hierarchical visualization.

🔹 Disadvantages

❌ Computationally expensive (O(n² log n)).

❌ Merging decisions are irreversible (no backtracking).
❌ Sensitive to outliers and noise.

2️⃣ Divisive Hierarchical Clustering – Top-Down Approach

Divisive clustering takes the opposite approach of Agglomerative clustering. It starts with one large
cluster and splits it iteratively into smaller clusters until each data point is its own cluster.

🔹 Steps of Divisive Clustering

1. Start with all data points in one cluster.

2. Split the cluster into two smaller clusters based on dissimilarity.

3. Repeat the process recursively until each data point is its own cluster.

4. Dendrogram Analysis: Like Agglomerative clustering, we can cut the dendrogram at a suitable
level to determine clusters.

🔹 How to Split a Cluster?

The most common approach is:

 Using K-means or spectral clustering to divide clusters at each step.

 Using Principal Component Analysis (PCA) to find the best way to split the cluster.

🔹 Advantages of Divisive Clustering

✔️More accurate than Agglomerative clustering in some cases.

✔️Can handle large datasets if implemented efficiently.

🔹 Disadvantages

❌ Computationally expensive (worse than Agglomerative).

❌ Less commonly used in practice because of its high cost.

3️⃣ Dendrogram – Visualizing Hierarchical Clustering

A dendrogram is a tree-like diagram that represents the sequence of merging (in Agglomerative
clustering) or splitting (in Divisive clustering).

 The vertical axis represents the distance or dissimilarity between clusters.

 The horizontal axis represents the data points.

 Cutting the dendrogram at different levels results in different cluster formations.

Example of Dendrogram Usage

 If we cut the dendrogram at a high level, we get fewer clusters.

 If we cut it lower, we get more detailed clustering.

When to Use Hierarchical Clustering?

✔ Small to Medium datasets (not scalable for very large data).

✔ When hierarchical relationships in data are important.
✔ When you don’t know the number of clusters beforehand.
✔ When clusters are not well-separated or non-spherical.

🚫 Not recommended for very large datasets due to high computational cost.
https://www.youtube.com/watch?v=oNYtYm0tFso
https://www.youtube.com/watch?v=0A0wtto9wHU
https://www.youtube.com/watch?v=35VgJ84sqqI

https://www.youtube.com/watch?v=jcdT_pVRqlE

Deep Learning 117 MCQ
No ratings yet
Deep Learning 117 MCQ
33 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Machine Learning-Driven Credit Risk: A Systemic Review
No ratings yet
Machine Learning-Driven Credit Risk: A Systemic Review
13 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
Unit IV Ensemble Unsupervised Learning
No ratings yet
Unit IV Ensemble Unsupervised Learning
5 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
05 Rnns
No ratings yet
05 Rnns
121 pages
Crash 1500
No ratings yet
Crash 1500
77 pages
Grouping
No ratings yet
Grouping
98 pages
Module 5
No ratings yet
Module 5
43 pages
Naive - Bayes - Ipynb - Colab
No ratings yet
Naive - Bayes - Ipynb - Colab
3 pages
Unit 2 ML
No ratings yet
Unit 2 ML
47 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Dmaclat4 Merged
No ratings yet
Dmaclat4 Merged
46 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
AIML CO-3 and 4 Practice Problems Answers
No ratings yet
AIML CO-3 and 4 Practice Problems Answers
35 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
ML - 8
No ratings yet
ML - 8
70 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
23AD2001R Lab Workbook
No ratings yet
23AD2001R Lab Workbook
56 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Module 5 - Notes - 13 12 2024
No ratings yet
Module 5 - Notes - 13 12 2024
45 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Big Data Techniques of 2025
No ratings yet
Big Data Techniques of 2025
31 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Cluster Analysis: Minh Tran, PHD
No ratings yet
Cluster Analysis: Minh Tran, PHD
37 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
Unit 4
No ratings yet
Unit 4
16 pages
MLP Vs RBF Doctoral Thesis
No ratings yet
MLP Vs RBF Doctoral Thesis
32 pages
Mod3 DM
No ratings yet
Mod3 DM
20 pages
Data Mining - Lecture 9
No ratings yet
Data Mining - Lecture 9
29 pages
NPTEL Live Session Week 1 Deep Learning-IIT Ropar
No ratings yet
NPTEL Live Session Week 1 Deep Learning-IIT Ropar
26 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
RNN LSTM
No ratings yet
RNN LSTM
16 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Lecture09 SVM Intro, Kernel Trick (Updated)
No ratings yet
Lecture09 SVM Intro, Kernel Trick (Updated)
36 pages
Session-5 - DBMS
No ratings yet
Session-5 - DBMS
18 pages
19cse353 L23
No ratings yet
19cse353 L23
15 pages
Cluster
No ratings yet
Cluster
20 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
21 pages
PRACTICAL FILE FML - Jatin
No ratings yet
PRACTICAL FILE FML - Jatin
15 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Charalambous 2000
No ratings yet
Charalambous 2000
23 pages
GANS
No ratings yet
GANS
22 pages
Ensemble Learning
No ratings yet
Ensemble Learning
12 pages
Clustering
No ratings yet
Clustering
45 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Clustering Personal
No ratings yet
Clustering Personal
9 pages
Abss
No ratings yet
Abss
8 pages
Introduction To Machine Learning - Final Quiz 2
No ratings yet
Introduction To Machine Learning - Final Quiz 2
11 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Clustering
No ratings yet
Clustering
11 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Clustering
No ratings yet
Clustering
11 pages
How To Reduce Overfitting With Dropout Regularization in Keras
No ratings yet
How To Reduce Overfitting With Dropout Regularization in Keras
12 pages
CSE1015 - Machine Learning Essentials: J Component Report
No ratings yet
CSE1015 - Machine Learning Essentials: J Component Report
18 pages
ML QB Unit Wise
No ratings yet
ML QB Unit Wise
11 pages
Unit 5
No ratings yet
Unit 5
10 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
Rait Terminal Questions Amswers
No ratings yet
Rait Terminal Questions Amswers
11 pages
2022 Ac Exnormal
No ratings yet
2022 Ac Exnormal
7 pages
Exp 11 2
No ratings yet
Exp 11 2
3 pages
Clustering New
No ratings yet
Clustering New
6 pages
Pa TH MDM
No ratings yet
Pa TH MDM
4 pages
Bi RNN
No ratings yet
Bi RNN
2 pages
SSRN Id3768295
No ratings yet
SSRN Id3768295
7 pages
CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNIN - Docx Syllabus
No ratings yet
CS3491 ARTIFICIAL INTELLIGENCE AND MACHINE LEARNIN - Docx Syllabus
1 page
Question Bank Advanced CO1, CO2
No ratings yet
Question Bank Advanced CO1, CO2
4 pages
Clustering
No ratings yet
Clustering
7 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
E-Eli5-Way-3bd2b1164a53: CNN (Source:)
No ratings yet
E-Eli5-Way-3bd2b1164a53: CNN (Source:)
4 pages
Experience The Mahabharat Through Play CertificationKLVFinal
No ratings yet
Experience The Mahabharat Through Play CertificationKLVFinal
9 pages
Cluster Analysis - Approach 1
No ratings yet
Cluster Analysis - Approach 1
28 pages
Artificial Neural Network Part-2
No ratings yet
Artificial Neural Network Part-2
15 pages
Y21 B.Tech In-Semester II Examinations, November-2024 (2024-25 Odd Sem) TimeTable
No ratings yet
Y21 B.Tech In-Semester II Examinations, November-2024 (2024-25 Odd Sem) TimeTable
1 page
The Adaline Learning Algorithm
No ratings yet
The Adaline Learning Algorithm
11 pages
An Introduction To Clustering Methods
No ratings yet
An Introduction To Clustering Methods
8 pages
Agnes
No ratings yet
Agnes
25 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Python - Programming
No ratings yet
Python - Programming
9 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Partition

Uploaded by

Partition

Uploaded by

Partition-Based Clustering Methods

3. CLARANS (Clustering Large Applications based on RANdomized Search)

 Concept: An improved version of k-medoids that optimizes medoid selection using a

Built in statistical analysis packages, such as S+

Strength: deals with larger data sets than PAM

Efficiency depends on the sample size

✅ DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

o ε (epsilon): radius of neighborhood around a point.

o minPts: minimum number of points required to form a dense region.

1. Label each point as core, border, or noise.

2. Connect core points within ε of each other.

3. Expand clusters from core points.

✅ OPTICS (Ordering Points To Identify the Clustering Structure)

 Extension of DBSCAN that handles varying density better.

 Can find clusters of arbitrary shape.

 Handles noise well.

 No need to specify number of clusters (for DBSCAN).

 Choosing optimal ε and minPts can be tricky.

 Struggles with clusters of varying densities (DBSCAN).

 Not efficient for high-dimensional data.

Hierarchical clustering can be divided into two main types:

2. Divisive Hierarchical Clustering – Top-down approach

Let’s explore both in detail:

1️⃣ Agglomerative Hierarchical Clustering (AHC) – Bottom-Up Approach

🔹 Steps of Agglomerative Clustering:

1. Start with each data point as its own cluster.

2. Compute distances (or similarity) between all clusters.

3. Merge the two closest clusters.

🔹 Linkage Criteria (How to Measure Distance Between Clusters?)

To decide which clusters to merge, different linkage methods can be used:

Linkage Type Description

Complete Linkage Distance between the farthest points of two clusters.

🔹 Advantages of Agglomerative Clustering

❌ Computationally expensive (O(n² log n)).

2️⃣ Divisive Hierarchical Clustering – Top-Down Approach

🔹 Steps of Divisive Clustering

1. Start with all data points in one cluster.

2. Split the cluster into two smaller clusters based on dissimilarity.

🔹 How to Split a Cluster?

The most common approach is:

 Using K-means or spectral clustering to divide clusters at each step.

🔹 Advantages of Divisive Clustering

✔️More accurate than Agglomerative clustering in some cases.

❌ Computationally expensive (worse than Agglomerative).

3️⃣ Dendrogram – Visualizing Hierarchical Clustering

 The vertical axis represents the distance or dissimilarity between clusters.

 The horizontal axis represents the data points.

 Cutting the dendrogram at different levels results in different cluster formations.

Example of Dendrogram Usage

 If we cut the dendrogram at a high level, we get fewer clusters.

 If we cut it lower, we get more detailed clustering.

When to Use Hierarchical Clustering?

✔ Small to Medium datasets (not scalable for very large data).

You might also like