Hierarchical
Hierarchical
CLUSTERING
HIERARCHICAL
CLUSTERING ALGORITHM
Also called Hierarchical cluster analysis or HCA is an unsupervised
clustering algorithm which involves creating clusters that have
predominant ordering from top to bottom.
For e.g. All files and folders on our hard disk are organized in a hierarchy.
The algorithm groups similar objects into groups called clusters. The
endpoint is a set of clusters or groups, where each cluster is distinct
from each other cluster, and the objects within each cluster are broadly
similar to each other.
This clustering technique is divided into two types:
Agglomerative Hierarchical Clustering
Divisive Hierarchical Clustering
AGGLOMERATIVE
HIERARCHICAL CLUSTERING
The Agglomerative Hierarchical Clustering is the most common
type of hierarchical clustering used to group objects in clusters
based on their similarity. It’s also known as AGNES (Agglomerative
Nesting). It's a “bottom-up” approach: each observation starts
in its own cluster, and pairs of clusters are merged as one
moves up the hierarchy.
HOW DOES IT WORK?
Make each data point a single-point cluster → forms N clusters
Take the two closest data points and make them one cluster →
forms N-1 clusters
Take the two closest clusters and make them one cluster → Forms
N-2 clusters.
Repeat step-3 until you are left with only one cluster.
LINKAGE METHODS FOR
CLUSTER OBSERVATIONS
There are several ways to measure the distance between clusters in order
to decide the rules for clustering, and they are often called Linkage
Methods. Some of the common linkage methods are:
Step 2: After each iteration, remove the “outsiders” from the least
cohesive cluster.