0% found this document useful (0 votes)

26 views41 pages

L08 Hierachical Agglomerative Clustering

Uploaded by

black hello

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views41 pages

L08 Hierachical Agglomerative Clustering

Uploaded by

black hello

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Hierarchical Agglomerative

Clustering

1
Hierarchical Agglomerative Clustering

• builds hierarchy of clusters

• starts with all the data points
assigned to a cluster of their own
• then two nearest clusters are
merged into the same cluster
• in the end, this algorithm
terminates when there is only a
single cluster left
• results of hierarchical clustering
can be shown using dendrogram
Source: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-
methods-of-clustering/ 2
Hierarchical Agglomerative Clustering

Income

Age
Hierarchical Agglomerative Clustering
Find closest pair, merge into a cluster

Income

Age
Hierarchical Agglomerative Clustering
Find next closest pair and merge

Income

Age
Hierarchical Agglomerative Clustering
Find next closest pair and merge

Income

Age
Hierarchical Agglomerative Clustering
Keep merging closest pairs

Income

Age
Hierarchical Agglomerative Clustering
If the closest pair is two clusters, merge them

Income

Age
Hierarchical Agglomerative Clustering
Keep merging closest pairs and clusters

Income

Age
Hierarchical Agglomerative Clustering
Keep merging closest pairs and clusters

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 6

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 5

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 4

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 3

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 2

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 1

Income

Age
Hierarchical Agglomerative Clustering

the correct number of clusters is

Condition 1
reached

minimum average cluster

Condition 2 distance reaches a set value
Hierarchical Agglomerative Clustering
Current number of clusters = 5

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 5

distance
Cluster
Hierarchical Agglomerative Clustering
Current number of clusters = 4

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 4

distance
Cluster
Hierarchical Agglomerative Clustering
Current number of clusters = 3

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 3

distance
Cluster
Hierarchical Agglomerative Clustering
Current number of clusters = 2

Income

Age
Hierarchical Agglomerative Clustering
Current number of clusters = 2

distance
Cluster
Agglomerative Clustering
• Start with N groups each with one instance and merge two
closest groups at each iteration
• Distance between two groups Gi and Gj:
• Single-link:

• Complete-link:

• Average-link, centroid

26
Hierarchical Linkage Types
Single linkage: minimum pairwise distance between
clusters

Income

Age
Hierarchical Linkage Types
Single linkage: minimum pairwise distance between
clusters

Income

Age
Example: Single-Link Clustering

Dendrogram
29
Hierarchical Linkage Types
Complete linkage: maximum pairwise distance between
clusters

Income

Age
Hierarchical Linkage Types
Complete linkage: maximum pairwise distance between
clusters

Income

Age
Hierarchical Linkage Types
Average linkage: average pairwise distance between
clusters

Income

Age
Hierarchical Linkage Types
Average linkage: average pairwise distance between
clusters

Income

Age
Hierarchical Linkage Types
Ward linkage: merge based on best inertia

Income

Age
K-means vs Hierarchical Clustering
K-means Clustering Hierarchical Clustering
can handle big data well can’t handle big data well
time complexity is linear O(n) time complexity is quadratic i.e.,
O(n2)
start with random choice of results are reproducible
clusters, the results produced by
running the algorithm multiple
times might differ
work well when the shape of the
clusters is hyper spherical (like
circle in 2D, sphere in 3D)
requires prior knowledge of K i.e. can stop at whatever number of
no. of clusters clusters when find appropriate in
hierarchical clustering by
35
interpreting the dendrogram
Other Types of Clustering
Mini-Batch Affinity Mean Spectral
Ward DBSCAN
K-Means Propagation Shift Clustering

.01s 8.17s .02s .31s .21s .10s

0.1s 8.17s .03s .03s .26s .10s

.01s 8.45s .03s .04s .31s .11s

.02s 8.53s .06s .08s .21s .10s

Reference: 36
DBSCAN
Clustering

37
Density-Based Clustering Algorithms
• Density-Based Clustering
• identify distinctive groups/clusters in the data, based on the
idea that a cluster in data space is a contiguous region of high
point density, separated from other such clusters by
contiguous regions of low point density
• Density-Based Spatial Clustering of Applications with
Noise (DBSCAN)
• base algorithm for density-based clustering
• can discover clusters of different shapes and sizes from a
large amount of data, which is containing noise and outliers

38
DBSCAN Algorithm
1. It starts with a random unvisited starting data point.
All points within a distance ‘Epsilon – Ɛ classify as
neighborhood points.
2. You need a minimum number of points within the
neighborhood to start the clustering process. Under
such circumstances, the current data point becomes
the first point in the cluster. Otherwise, the point
gets labeled as ‘Noise.’ In either case, the current
point becomes a visited point.
3. All points within the distance Ɛ become part of the
same cluster. Repeat the procedure for all the new
points added to the cluster group.
4. Continue with the process until you visit and label
each point within the Ɛ neighborhood of the cluster.
5. On completion of the process, start again with a new
unvisited point thereby leading to the discovery of
more clusters or noise. At the end of the process,
you ensure that you mark each point as either
cluster or noise.
39
Source: https://www.digitalvidya.com/blog/the-top-5-clustering-algorithms-data-scientists-should-know/
Pros vs Cons DBSCAN Clustering
• Pros
• better than other cluster algorithms because it does not require a
pre-set number of clusters
• identifies outliers as noise, unlike the Mean-Shift method that forces
such points into the cluster in spite of having different characteristics
• finds arbitrarily shaped and sized clusters quite well
• Cons
• not very effective when have clusters of varying densities. There is a
variance in the setting of the distance threshold Ɛ and the minimum
points for identifying the neighborhood when there is a change in the
density levels
• high dimensional data, the determining of the distance threshold Ɛ
becomes a challenging task
40
Reference:
• https://www.guru99.com/unsupervised-machine-learning.html
• https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-
unsupervised-learning#dimension-reduction
• https://www.ibm.com/cloud/learn/unsupervised-learning
• https://levelup.gitconnected.com/importance-of-data-
preprocessing-and-scaling-in-machine-learning-21db1d4377ec
• https://www.analyticsvidhya.com/blog/2016/11/an-introduction-
to-clustering-and-different-methods-of-clustering/
• https://www.digitalvidya.com/blog/the-top-5-clustering-
algorithms-data-scientists-should-know/

(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
100% (8)
(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
51 pages
Measured Mile
100% (2)
Measured Mile
8 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Unit 2
No ratings yet
Unit 2
33 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Module 5
No ratings yet
Module 5
43 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Clustering
No ratings yet
Clustering
12 pages
Cluster
100% (1)
Cluster
72 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Clustering
No ratings yet
Clustering
69 pages
Clustering
No ratings yet
Clustering
110 pages
Clustering
No ratings yet
Clustering
11 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
7 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Clustering
No ratings yet
Clustering
38 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
Module - 05 Machine Learning (BCS602) Search Creators
No ratings yet
Module - 05 Machine Learning (BCS602) Search Creators
47 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
ML Unit 4
No ratings yet
ML Unit 4
15 pages
Clustering
No ratings yet
Clustering
29 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
Clustering
No ratings yet
Clustering
65 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
No ratings yet
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
66 pages
Clustering
No ratings yet
Clustering
75 pages
Hierarchical Clustering in Data Mining
No ratings yet
Hierarchical Clustering in Data Mining
4 pages
Clustering
No ratings yet
Clustering
39 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
3 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Clustering 2
No ratings yet
Clustering 2
17 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Week 10
No ratings yet
Week 10
84 pages
Lec 2
No ratings yet
Lec 2
32 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Clustring
No ratings yet
Clustring
20 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Agnes
No ratings yet
Agnes
25 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
4 - Unsupervised Classification
No ratings yet
4 - Unsupervised Classification
21 pages
Guide To Install Visual Studio 2019
No ratings yet
Guide To Install Visual Studio 2019
3 pages
Chapter 10 Application Layer - July 2023
No ratings yet
Chapter 10 Application Layer - July 2023
36 pages
Chap01 - Intro To Programming
No ratings yet
Chap01 - Intro To Programming
37 pages
Chapter 6 Network Layer - July 2023
No ratings yet
Chapter 6 Network Layer - July 2023
58 pages
Chapter 6 - Multimedia Element Video
No ratings yet
Chapter 6 - Multimedia Element Video
44 pages
Chapter 2 Network Protocols - Communication - July 2023
No ratings yet
Chapter 2 Network Protocols - Communication - July 2023
56 pages
Chapter 4 Data Link Layer (OSI Model) - July 2023
No ratings yet
Chapter 4 Data Link Layer (OSI Model) - July 2023
39 pages
L03 Generalization, Train Test Splits and Validation
No ratings yet
L03 Generalization, Train Test Splits and Validation
49 pages
Setup - Firebase
No ratings yet
Setup - Firebase
9 pages
Practical 1 Slide
No ratings yet
Practical 1 Slide
20 pages
Practical 3 - ESP32 WiFi
100% (1)
Practical 3 - ESP32 WiFi
9 pages
L01 Introduction To ML
No ratings yet
L01 Introduction To ML
16 pages
L02 Classification and Regression
No ratings yet
L02 Classification and Regression
26 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
Practical 2 Hadoop Distributed File System (HDFS)
No ratings yet
Practical 2 Hadoop Distributed File System (HDFS)
4 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
L05 Unsupervised Learning - Overview
No ratings yet
L05 Unsupervised Learning - Overview
16 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Malhotra MR6e 20
No ratings yet
Malhotra MR6e 20
46 pages
B. Tech Computer Science Engineering (AI ML) (5th Sem)
No ratings yet
B. Tech Computer Science Engineering (AI ML) (5th Sem)
13 pages
Unit 6 Machine Learning Algorithms
No ratings yet
Unit 6 Machine Learning Algorithms
13 pages
A Literature Survey On ATM
No ratings yet
A Literature Survey On ATM
16 pages
ML Roadmap
No ratings yet
ML Roadmap
7 pages
Overview of Data Mining
No ratings yet
Overview of Data Mining
4 pages
Data Mining - NOTES 2022
No ratings yet
Data Mining - NOTES 2022
16 pages
ISYE 6740 - (SU22) Syllabus
No ratings yet
ISYE 6740 - (SU22) Syllabus
6 pages
Ayush ML
No ratings yet
Ayush ML
29 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
BayesiaLab Specifications en
No ratings yet
BayesiaLab Specifications en
17 pages
ML
No ratings yet
ML
49 pages
Human Activity Recognization
No ratings yet
Human Activity Recognization
80 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
ALX Data Science Program Description
No ratings yet
ALX Data Science Program Description
12 pages
ML Lab Exp 7 K-Means Clustering
No ratings yet
ML Lab Exp 7 K-Means Clustering
14 pages
Dips v7 Manual (081-100)
No ratings yet
Dips v7 Manual (081-100)
20 pages
Module 4 Question Bank: Big Data Analytics
No ratings yet
Module 4 Question Bank: Big Data Analytics
2 pages
IoT AMAR2
No ratings yet
IoT AMAR2
217 pages
Biological Data Mining: Back Ground
No ratings yet
Biological Data Mining: Back Ground
3 pages
Analysis & Pediction Using WEKA Machine Learing Toolkit
No ratings yet
Analysis & Pediction Using WEKA Machine Learing Toolkit
37 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Lecture 2
No ratings yet
Lecture 2
18 pages
Midterm Exam - FALL 2020 Artificial Intelligence
No ratings yet
Midterm Exam - FALL 2020 Artificial Intelligence
3 pages
09 - AI-900 1-35 - M - Answered
No ratings yet
09 - AI-900 1-35 - M - Answered
9 pages
6415 BI Journal
No ratings yet
6415 BI Journal
116 pages

L08 Hierachical Agglomerative Clustering

Uploaded by

L08 Hierachical Agglomerative Clustering

Uploaded by

Hierarchical Agglomerative

• builds hierarchy of clusters

the correct number of clusters is

minimum average cluster

.01s 8.17s .02s .31s .21s .10s

0.1s 8.17s .03s .03s .26s .10s

.01s 8.45s .03s .04s .31s .11s

.02s 8.53s .06s .08s .21s .10s

You might also like