Partition
Partition
Partitioning clustering methods divide a dataset into distinct groups (clusters) such that data
points in the same group are more similar to each other than to those in different groups. The
goal is often to minimize some criterion, like the sum of squared errors (SSE). Here are three
common methods:
1. K-Means Clustering
Concept: K-means partitions data into kkk clusters, where each cluster is represented by
its centroid (the mean of the points in the cluster).
Algorithm:
1. Choose kkk initial centroids randomly.
2. Assign each data point to the nearest centroid.
3. Recalculate centroids as the mean of all points assigned to them.
4. Repeat steps 2-3 until centroids no longer change significantly.
Criterion: Minimizes the sum of squared distances between points and their assigned
cluster centroid.
2. K-Medoids Clustering
Concept: Similar to k-means, but instead of centroids (mean values), it selects actual data
points (medoids) to represent clusters.
Algorithm:
1. Initialize kkk medoids (representative points from the dataset).
2. Assign each point to the closest medoid.
3. Swap medoids with non-medoid points to see if the clustering improves (lower
total dissimilarity).
4. Repeat until there are no more beneficial swaps.
Advantage: More robust to outliers compared to k-means because it uses real data points.
It draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as
the output
Weakness:
A good clustering based on samples will not necessarily represent a good clustering of the whole data
set if the sample is biased
Use the k-means algorithm and Euclidean distance to cluster the following 8 examples into 3 clusters:
A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9).
Suppose that the initial seeds (centers of each cluster) are A1, A4 and A7. Run the k-means algorithm for
1 epoch only. At the end of this epoch show:
a) The new clusters (i.e. the examples belonging to each cluster)
b) The centers of the new clusters
c) Draw a 10 by 10 space with all the 8 points and show the clusters after the first epoch and the new
centroids.
d) How many more iterations are needed to converge? Draw the result for each epoch
Density based clustering algorithm
Density-based clustering groups data based on areas of high density, separating out low-density areas as
noise or outliers. These methods are particularly good for discovering clusters of arbitrary shape and
handling noise.
📌 Common Methods:
Key idea: Clusters are formed from regions of high density separated by regions of low density.
Parameters:
Steps:
4. Border points are assigned to the nearest core cluster; noise is discarded.
Produces a reachability plot rather than explicit clusters; clusters can be extracted later.
📈 Advantages:
⚠️Disadvantages:
Hierarchical clustering is a method of clustering that creates a hierarchy of clusters in the form of a tree
structure called a dendrogram. Unlike K-means clustering, hierarchical clustering does not require
specifying the number of clusters beforehand.
Agglomerative clustering starts with each data point as its own cluster and merges the most similar
clusters at each step until only one cluster remains.
4. Repeat steps 2–3 until all points are in one cluster or the desired number of clusters is reached.
5. Dendrogram Analysis: The hierarchical structure can be visualized using a dendrogram, where
we can cut at different levels to get different numbers of clusters.
Single Linkage Distance between the closest (nearest) points of two clusters.
Average Linkage Average of all pairwise distances between points in two clusters.
Centroid Linkage Distance between the centroids (mean points) of two clusters.
Ward’s Method Minimizes the variance within each cluster to form compact groups.
Example:
Consider five points in 2D space. Using single linkage, the two closest points merge first, and this
process continues iteratively.
🔹 Disadvantages
Divisive clustering takes the opposite approach of Agglomerative clustering. It starts with one large
cluster and splits it iteratively into smaller clusters until each data point is its own cluster.
3. Repeat the process recursively until each data point is its own cluster.
4. Dendrogram Analysis: Like Agglomerative clustering, we can cut the dendrogram at a suitable
level to determine clusters.
Using Principal Component Analysis (PCA) to find the best way to split the cluster.
🔹 Disadvantages
🚫 Not recommended for very large datasets due to high computational cost.
https://www.youtube.com/watch?v=oNYtYm0tFso
https://www.youtube.com/watch?v=0A0wtto9wHU
https://www.youtube.com/watch?v=35VgJ84sqqI
https://www.youtube.com/watch?v=jcdT_pVRqlE