0% found this document useful (0 votes)
8 views

Clustering

Uploaded by

97 Haseeb
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Clustering

Uploaded by

97 Haseeb
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Clustering

Hierarchical clustering
• Hierarchical clustering is a type of clustering
algorithm used in unsupervised machine learning to
group similar data points into clusters. Unlike other
clustering techniques like k-means, hierarchical
clustering creates a tree-like structure called
a dendrogram to represent how data points are
grouped together at various levels of granularity.
Types of Hierarchical Clustering
1.Agglomerative (Bottom-Up):
1.Starts with each data point as its own cluster.
2.Iteratively merges the closest clusters until only one cluster
remains or a desired number of clusters is achieved.
2.Divisive (Top-Down):
1.Starts with all data points in a single cluster.
2.Iteratively splits the clusters into smaller clusters until each
data point is its own cluster or a desired number of clusters is
reached.
.
Steps in Agglomerative
Hierarchical Clustering
1.Compute Pairwise Distances:
Calculate the distance (e.g., Euclidean, Manhattan) between all
pairs of data points.
2.Merge Closest Clusters:
Find the two clusters with the smallest distance and merge
them into a new cluster.
3.Update Distances:
Recalculate distances between the new cluster and all
remaining clusters using a linkage criterion:
• Single Linkage: Minimum distance between points in two clusters.
• Complete Linkage: Maximum distance between points in two clusters.
• Average Linkage: Average distance between all points in two clusters.
• Ward's Method: Minimizes the variance between clusters.
4. Repeat Until Stopping Criterion:
• Continue merging clusters until only one cluster remains or a
predefined number of clusters is reached.
• Key Output: Dendrogram
• A dendrogram is a tree-like diagram that shows the sequence of
merges and the distance at which clusters are merged.
• By cutting the dendrogram at a specific height, you can choose
the desired number of clusters.
• Applications
• Customer Segmentation: Group customers based on
purchasing behavior.
• Document Clustering: Cluster documents based on
content similarity.
• Gene Expression Analysis: Group genes with similar
expression patterns.
• Image Segmentation: Identify similar regions in
images.
Advantages
• Does not require specifying the number of clusters in
advance.
• Captures hierarchical relationships among data points.
• Suitable for small-to-moderate datasets.
• Disadvantages
• Computationally expensive for large datasets
(O(n2)O(n2) time and space complexity).
• Sensitive to outliers.
• Choice of linkage criterion can significantly affect
results.
Example: Visualizing Hierarchical
Clustering
• Generate data points.
• Compute a distance matrix.
• Use a hierarchical clustering algorithm (e.g.,
scipy.cluster.hierarchy in Python).
• Plot the dendrogram to analyze clusters.
Example
.
.
.
.
.

You might also like