0% found this document useful (0 votes)
30 views26 pages

Hierarchical Clustering

Uploaded by

bodasantosh91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views26 pages

Hierarchical Clustering

Uploaded by

bodasantosh91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Hierarchical

Clustering:
PRESENTED BY

RAVI RANJAN

ANISHA BHARTI

RUNU SUKEERTI
Clustering in Machine Learning
Definition: Unsupervised learning technique for grouping data points into clusters based on similarity.

Types of Clustering Methods

1 Partitioning Methods 2 Hierarchical Clustering 3 Density-Based Methods


K-Means algorithm divides data Creates a tree of clusters. DBSCAN algorithm finds
into non-overlapping subsets. It Includes agglomerative clusters based on density. It can
aims to minimize within-cluster (bottom-up) and divisive (top- detect clusters of arbitrary
distances. down) approaches. shape.

4 Model-Based Clustering
Gaussian Mixture Models assume data comes from a mixture of probability distributions.
Hierarchical Clustering
(Definition)
Hierarchical clustering is a method of clustering that builds a hierarchy of clusters. It
creates a tree-like structure (called a dendrogram), where data points are grouped
based on their similarity. It can either start by treating each data point as a separate
cluster and then merging them (agglomerative) or by treating all data points as a
single cluster and splitting them (divisive).

Need for Hierarchical Clustering

No need to specify the number of clusters: Unlike K-means, you don’t have to
define the number of clusters beforehand.

Dendrogram visualization: It provides a visual representation (dendrogram) that


shows the cluster formation process and helps in determining the optimal number of
clusters.

Suitable for small datasets: Works well with smaller datasets where visual
interpretation and understanding of cluster hierarchies is important.

Captures nested clusters: Useful for data with a natural hierarchical structure, like
taxonomies or evolutionary trees.
Types of Hierarchical Clustering
Agglomerative Clustering: Bottom-Up
Approach
How it Works: This method starts with each data point as its own individual cluster. For example, if you have 100 data
points, you will initially have 100 separate clusters.

Merging Process: Then, similar clusters are identified based on their proximity (using distance metrics like Euclidean
distance). After that, these similar clusters are merged to form larger clusters.

Step-by-Step Merging: This process continues until all clusters are merged into a single cluster that represents the
entire dataset.

When to Use: Agglomerative clustering is suitable for small to medium-sized datasets. This approach can be slower
and may not be efficient for large datasets.
Agglomerative Clustering:
Bottom-Up Approach
Individual Clusters
Start with each data point as a separate cluster. For 100 points, begin with
100 clusters.

Similarity Calculation
Compute distances between clusters. Use metrics like Euclidean distance
to measure similarity.

Merging
Combine the most similar clusters. This process continues until one
cluster remains.

Dendrogram Analysis
Examine the dendrogram to determine optimal cluster number. Cut the
tree at appropriate level.
Divisive Clustering: Top-Down Approach
How it Works: In this method, the entire dataset is initially treated as a single cluster. This means all data points are
grouped together at the start.

Splitting Process: The single cluster is then gradually divided into smaller clusters based on their similarity or
distance. This process is recursive, meaning that at each step, a cluster is further split until each data point has its own
cluster.
Step-by-Step Splitting: Clusters are divided based on similarity or dissimilarity, and this continues until meaningful,
smaller clusters are formed.

Dendrogram: A dendrogram is also used in this method, but it starts from the top and moves downward as smaller
clusters are created. In this dendrogram, you can see where to stop splitting to obtain optimal clusters.

When to Use: Divisive clustering is generally used when you have a large dataset or when you want to display a clear
hierarchical structure. This method can be more computationally intensive but can be effective for large datasets.

.
Divisive Clustering: Top-
Down Approach
1 Single Cluster
Start with all data points in one cluster. This represents
the entire dataset.

2 Splitting Process
Divide the cluster based on dissimilarity. Create
smaller, more homogeneous groups.

3 Recursive Division
Continue splitting until each data point is its own
cluster. Or stop at desired level.
Distance Metrics and
Linkage Criteria
Understanding distance metrics and linkage criteria is crucial in
the field of clustering, as they determine how similarity between
data points is measured and how clusters are formed. This
overview explores the key concepts and their applications.
Euclidean Distance
Definition Use Cases Limitations

Euclidean distance is the straight- Euclidean distance is particularly It can be sensitive to differences
line distance between two points effective when the data has a in scale and may not perform well
in a multidimensional space. It is clear cluster structure and the when the data has varying
the most commonly used clusters are compact and well- densities or complex shapes.
distance metric in clustering separated.
algorithms.
Manhattan Distance
Definition Characteristics
Also known as the city block distance, Manhattan distance Manhattan distance is less sensitive to outliers and more
measures the absolute difference between the coordinates of robust to noise than Euclidean distance.
two points.

Applications Comparison
It is often used in tasks where the data has a grid-like Manhattan distance is generally more computationally efficient
structure, such as image processing and text analysis. than Euclidean distance, making it a popular choice for large-
scale clustering problems.
Cosine Similarity

Vector Orientation
Cosine similarity measures the cosine of the angle between two non-zero
vectors, focusing on their orientation rather than magnitude.

Text Analysis
It is commonly used in text mining and information retrieval, where it measures
the similarity between document vectors.

Clustering Preference
Cosine similarity is often preferred when the data has no clear scale, such as in
high-dimensional text data or gene expression data.
Linkage Criteria
1 Single Linkage
Merges clusters based on the minimum distance
between any two points in the clusters.

2 Complete Linkage
Merges clusters based on the maximum distance
between any two points in the clusters.

3 Average Linkage
Merges clusters based on the average distance
between all pairs of points in the clusters.
Single Linkage
1 Minimum Distance 2 Chaining Effect
Single linkage uses the This method can be
minimum distance sensitive to outliers and
between any two points may result in long,
in the clusters to chain-like clusters.
determine when to
merge them.

3 Applications
Single linkage is often used in exploratory data analysis
to identify potential cluster structures.
Complete Linkage
Maximum Distance
Complete linkage uses the maximum distance between
any two points in the clusters to determine when to
merge them.

Compact Clusters
This method tends to produce more compact, spherical
clusters by favoring the merger of clusters with the
smallest maximum distance.

Sensitivity to Outliers
Complete linkage is more sensitive to outliers than
single linkage, as it is influenced by the maximum
distance between points.
Average Linkage
Compromise Average linkage is a
compromise between single
and complete linkage, using
the average distance between
all pairs of points in the
clusters.
Flexibility This method can produce
clusters with varying shapes
and sizes, making it a versatile
choice for many clustering
problems.

Robustness Average linkage is generally


less sensitive to outliers and
noise than other linkage
methods.
Ward's Method:
Principles and
Assumptions
Ward's method is a hierarchical clustering algorithm that uses a
specific distance metric to group data points. The core principle is to
minimize the variance within clusters, which leads to a more
balanced and cohesive clustering.

by Runu Sukeerti
Constructing a Dendrogram
1 Initial Steps
Each data point starts as its own cluster.

2 Merging Clusters
At each step, the two closest clusters are merged
based on the Ward's distance metric.

3 Hierarchical Structure
The merging process continues until all data points
are in one cluster, forming a hierarchical tree
structure.
Interpreting Dendrogram Structures
Branching Patterns Cluster Height Cluster Size

The dendrogram's branching The height of each cluster indicates The size of a cluster indicates the
patterns reveal the relationships the distance between merged number of data points belonging to
between clusters and their clusters, providing insights into their it.
proximity. similarity.
Practical Applications of
Dendrograms
Customer Segmentation
Group customers based on their purchasing behavior or
demographics for targeted marketing.

Gene Expression Analysis


Identify genes with similar expression patterns for
understanding biological processes.

Image Analysis
Cluster pixels or regions of an image to segment objects
or identify patterns.
Limitations and
Considerations
High Computational Cost Ward's method can be
computationally expensive for
large datasets.

Sensitivity to Outliers Outliers can distort the


dendrogram and affect
clustering results.

Difficulty with Complex Data Ward's method might struggle


with data that has non-
spherical clusters or varying
densities.
Selecting
Optimal Clusters
from a
Dendrogram
Dendrograms, tree-like diagrams, visually represent hierarchical
clustering. Selecting the optimal number of clusters is crucial for
insightful analysis.
Determining the Optimal Number of Clusters
Elbow Method Cut-Off Height Domain Knowledge

Look for a distinct "elbow" in the Choose a specific height on the Consider prior knowledge about the
dendrogram where the rate of dendrogram and cut the tree data and the desired number of
change in cluster distance horizontally, resulting in clusters clusters based on the problem's
decreases significantly. where the height represents the context.
similarity threshold.
Advantages of
Hierarchical Clustering
No Predefined Visual Insights
Cluster Number and
Dendrograms provide a
Cluster Size
visual representation of the
Hierarchical clustering does clustering process, aiding in
not require prior knowledge understanding the
of the number of clusters. relationships between data
points.

Hierarchical Structure and Unsupervised


Learning
Hierarchical clustering allows for understanding the relationships
between clusters at different levels of granularity.
Limitations and
Disadvantages of Hierarchical
Clustering
Computational Complexity
Hierarchical clustering can be computationally expensive, especially for large datasets.

Irreversible Merging
Once clusters are merged, they cannot be undone, making it difficult to adjust the
clustering based on new insights.

Sensitivity to Outliers
Outliers can significantly influence the clustering results, leading to inaccurate cluster
formations.
Conclusion and Future
Directions
Hierarchical clustering is a versatile method for exploring data and identifying
underlying relationships.

Further research can focus on improving computational efficiency and developing


methods to handle outliers effectively.

Integrating hierarchical clustering with other techniques, such as k-means clustering,


can lead to hybrid approaches for improved accuracy and interpretability.

You might also like