Exploring Unsupervised Learning Algorithms with the Iris Dataset
Exploring Unsupervised Learning Algorithms with the Iris Dataset
○ It has a clear distinct cluster in at least the petal and the sepal measurements.
○ Small, since it is computationally possible to handle Unfortunately Weibull
distribution has some shortcomings which are as follows.
○ This is suitable for cluster evaluation and has inherent structure for reorganizing
algorithms.
K-means Clustering
○ k stands for and is equal to the number of clusters. The choice of right k is again
very important and can be determined by methods such as the elbow method,
silhouette analysis as well as the gap statistic.
3. Handling overlapping clusters: K-means clustering standard assumption does not
allow overlapping of clusters and clusters are spherical in shape. It may wrongly map the
points within the overlapping area because it purely uses distance.
4. Effect of different initial centroids: K-means can also be sensitive with the initial
position of centroids and may end up in different results. Techniques such as the
k-means++ initialization can therefore be said to improve the consistency.
DBSCAN
○ Overall, suitable for clustering everything from randomly shaped groups and
small noise.
○ Challenges are observed when working with datasets in particular when they
have dissimilar densities or are in high dimensionality.
3. Noise identification: DBSCAN assigns noise for points which do not belong to any
single cluster. It does not like K-means place all the points into clusters which make it
less sensitive to the outliers.
Hierarchical Clustering
○ Single linkage: Concentrates on the nearest point in the data set and might build
elongated clusters.
○ Complete linkage: Again, the farthest points are considered; final clusters are
compact.
○ Average linkage: Balances between the two.
1. Cluster centroids: Mean Shift locates spots known as modes by continuously relocating
points in the direction of high point density.
2. Bandwidth parameter: Bandwidth in turn dictates the sphere of influence of each point
and has a proportional relationship with the number of clusters.
3. Handling non-spherical clusters: However, it should be noted that Mean Shift can
capture clusters of arbitrary shape, in contrast to K-means since Mean Shift is not limited
to finding clusters of a particular geometry.
1. Data modeling: GMM models the data as a sum of several Gaussian densities giving a
probabilistic clustering method.
3. Handling varying shapes and sizes: Another advantage of the |-GMM is in ability to
model clusters of any shape and different sizes because of its probabilistic approach and
flexibility of covariance matrices.
Comparative Analysis
○ The main drawback of the <inertial>|It should be noted that K-means may have a
problem with the issue of overlapping categories.
○ DBSCAN can discover noise but it can never plant clusters of different density in
the data they are searching.
○ Hierarchical clustering facilitates the assessment of relationships through
development of easy to understand Tree diagram.
○ Also, GMM is flexible in the shapes of the clusters.
2. Best algorithm for Iris species: The performance varies depending on the used
parameter and GMM’s main benefit might be in the matter of how well it separates
clusters because it is based on probability distribution.
○ K-means: Fast, easy; does not work well with clusters that are not spherical in
shape.
○ DBSCAN: Meliem solves noise issues; pose a parameter tuning concern.
○ Hierarchical: Reasonable; has high operational complexity in large datasets.
○ GMM: Flexible; but less efficient in terms of computational load needed.
1. Visualization: