0% found this document useful (0 votes)
16 views9 pages

Spooo

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Spooo

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

• Hierarchical clustering is a method used to group similar objects into

clusters, forming a hierarchical structure or tree-like representation called a


dendrogram. It doesn't require the number of clusters to be specified
beforehand, making it useful for exploratory data analysis. Hierarchical
clustering can be broadly classified into two types: agglomerative (bottom-
up) and divisive (top-down).
• The hierarchical clustering technique has two approaches:

• Agglomerative: Agglomerative is a bottom-up approach, in which the


algorithm starts with taking all data points as single clusters and merging
them until one cluster is left.
• Divisive: Divisive algorithm is the reverse of the agglomerative algorithm
as
it is a top-down approach.
• Agglomerative Hierarchical clustering

• The agglomerative hierarchical clustering algorithm is a popular


example of HCA. To group the datasets into clusters, it follows
the bottom-up approach. It means, this algorithm considers each
dataset as a single cluster at the beginning, and then start combining
the closest pair of clusters together. It does this until all the clusters
are merged into a single cluster that contains all the datasets.
• This hierarchy of clusters is represented in the form of the
dendrogram.
• The dendrogram is a tree-like structure that is mainly used to store each step
as a memory that the HC algorithm performs. In the dendrogram plot, the Y-
axis shows the Euclidean distances between the data points, and the x-axis
shows all the data points of the given dataset.
• The working of the dendrogram can be explained using the below diagram:
• Step 1: Compute the proximity matrix using a particular distance
metric
• Step 2: Each data point is assigned to a cluster
• Step 3: Merge the clusters based on a metric for the similarity
between clusters
• Step 4: Update the distance matrix
• Step 5: Repeat Step 3 and Step 4 until only a single cluster remains
APPLICATIONNS:
• Biology and Bioinformatics: Hierarchical clustering is widely used in genomics and bioinformatics to analyze

gene expression data, identify gene regulatory networks, and classify biological samples based on their expression

profiles.

• Marketing and Customer Segmentation: In marketing, hierarchical clustering is applied to segment customers

based on their purchasing behavior, demographics, or preferences. This information can be used for targeted

marketing campaigns and product recommendations.

• Image Analysis and Computer Vision: Hierarchical clustering is used in image processing and computer vision

for tasks such as image segmentation, object recognition, and content-based image retrieval.

• Text Mining and Document Clustering: In text analysis, hierarchical clustering is applied to group similar

documents or words together, enabling tasks such as document clustering, topic modeling, and sentiment analysis.

• Social Network Analysis: Hierarchical clustering can be used to analyze social networks by grouping users or

communities based on their interactions or network properties.


ADVANTAGES
• No Need for Predefined Number of Clusters: Hierarchical clustering does not

require specifying the number of clusters beforehand, making it suitable for

exploratory data analysis.

• Hierarchical Representation: Hierarchical clustering provides a hierarchical

structure (dendrogram) that can be visually inspected to understand the

relationships between clusters at different levels of granularity.

• Interpretability: The hierarchical structure produced by hierarchical clustering can

be interpreted to gain insights into the natural grouping of data points.

• Flexibility: Hierarchical clustering can handle different types of data (e.g.,

numerical, categorical) and distance metrics, allowing for a flexible approach to

clustering.
DISADVANTAGE:
• Computational Complexity: Agglomerative hierarchical clustering can be

computationally expensive, especially for large datasets, as it requires calculating

pairwise distances between all data points or clusters.

• Memory Usage: Hierarchical clustering may require storing the entire distance

matrix in memory, making it memory-intensive for large datasets.

• Difficulty in Handling Noise and Outliers: Hierarchical clustering may struggle

with noisy or outlier data points, as they can affect the merging process and lead to

suboptimal clustering results.

• Subjectivity in Dendrogram Interpretation: Interpreting the dendrogram

produced by hierarchical clustering can be subjective, and determining the

appropriate number of clusters or the level of granularity can be challenging.

You might also like