0% found this document useful (0 votes)
17 views11 pages

Hierarchical Clustering

Hierarchical clustering is a method that builds a hierarchy of clusters using an agglomerative approach, starting with individual data points and merging them iteratively. The process involves computing a distance matrix, initializing clusters, merging the closest clusters, and updating the distance matrix until one cluster remains, which can be visualized using a dendrogram. An example with data points A, B, C, D, and E demonstrates the steps of this clustering method.

Uploaded by

tiya Abid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views11 pages

Hierarchical Clustering

Hierarchical clustering is a method that builds a hierarchy of clusters using an agglomerative approach, starting with individual data points and merging them iteratively. The process involves computing a distance matrix, initializing clusters, merging the closest clusters, and updating the distance matrix until one cluster remains, which can be visualized using a dendrogram. An example with data points A, B, C, D, and E demonstrates the steps of this clustering method.

Uploaded by

tiya Abid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Hierarchical Clustering

Agglomerative Hierarchical Clustering


and Example
Introduction
• Hierarchical clustering is a method of cluster
analysis that builds a hierarchy of clusters. It is
an agglomerative (bottom-up) approach,
starting with each data point as its own cluster
and merging the closest pairs of clusters
iteratively. Alternatively, there is a divisive
(top-down) approach, starting with all data
points in one cluster and splitting them until
each data point is its own cluster.
Steps in Agglomerative Hierarchical
Clustering
• 1. Compute the Distance Matrix
• 2. Initialize Clusters
• 3. Find the Closest Clusters
• 4. Merge the Closest Clusters
• 5. Update the Distance Matrix
• 6. Repeat until one cluster remains
• 7. Create a Dendrogram
Example: Agglomerative
Hierarchical Clustering
• Data Points: A(1, 2), B(2, 3), C(3, 4), D(8, 7),
E(9, 8)
• 1. Compute Distance Matrix (using Euclidean
distance)
• 2. Initialize Clusters: A, B, C, D, E
• 3. Merge closest clusters (A and B) -> AB
• 4. Update Distance Matrix
• 5. Repeat merging and updating until one
cluster remains.
Step 1: Compute the Distance
Matrix
• Use the Euclidean distance formula:
• d = sqrt((x1 - x2)^2 + (y1 - y2)^2)

• For the dataset:


• A (1, 2), B (2, 3), C (3, 4), D (8, 7), E (9, 8)

• Distance Matrix:
• A B C D E
• A 0.00 1.41 2.83 9.22 11.31
• B 1.41 0.00 1.41 7.81 9.90
• C 2.83 1.41 0.00 6.08 8.24
• D 9.22 7.81 6.08 0.00 2.83
• E 11.31 9.90 8.24 2.83 0.00
Step 2: Initialize Clusters
• At the beginning, each point is its own cluster:
• Clusters: A, B, C, D, E
Step 3: Find the Closest Clusters
• Find the two closest clusters using the
distance matrix:

• Closest clusters: A and B with a distance of


1.41
Step 4: Merge the Closest Clusters
• Merge A and B into a new cluster AB:
• Updated Clusters: AB, C, D, E
Step 5: Update the Distance Matrix
• Using single linkage(The distance between two clusters is
the smallest (minimum) distance between any two points
in the clusters), update the distance matrix:
Distance({AB}, C) = min(d(A, C), d(B, C)) = 1.41.
Distance({AB}, D) = min(d(A, D), d(B, D)) = 7.81.
Distance({AB}, E) = min(d(A, E), d(B, E)) = 9.90.

• AB C D E
• AB 0.00 1.41 7.81 9.90
• C 1.41 0.00 6.08 8.24
• D 7.81 6.08 0.00 2.83
• E 9.90 8.24 2.83 0.00
Step 6: Repeat
• Repeat merging the closest clusters until only
one cluster remains:
• 1. Merge AB and C -> ABC
• 2. Merge ABC and D -> ABCD
• 3. Merge ABCD and E -> Final Cluster
Step 7: Create a Dendrogram
• A dendrogram illustrates the sequence of
merges.
• The height of branches represents the
distance at which clusters are merged.

You might also like