ML Module 4 Unsupervised Learning - Updated
ML Module 4 Unsupervised Learning - Updated
Introduction of Clustering
Types of Clustering
Partitioning based Clustering
K-means Algorithm
Fuzzy Clustering
Fuzzy C-Means Algorithm
Hierarchical clustering:
Also known as 'nesting clustering' as it also clusters to exist
within bigger clusters to form a tree. Example:
Agglometric Clustering
Fuzzy clustering:
It is used to reflect the fact that an object can
simultaneously belong to more than one group
CLUSTERING: TYPES
Density-based clustering:
In this clustering model there will be a searching of data
space for areas of varied density of data points in the
data space. Example: DBSCAN
Model-based clustering:
It provides a framework for incorporating our
knowledge about a domain.
PARTITIONING C LUSTERING
PARTITION BASED CLUSTERING
E XAMPLE : K-M EANS
K-MEANS CLUSTERING
An Iterative Clustering Algorithm
Partition-based Clustering
Each Cluster is associated with a centroid
Each point is assigned to the cluster with the closest
centroid
Number of clusters, K, must be specified.
K-MEANS CLUSTERING
K-MEANS CLUSTERING
1. Initial centroids are often chosen randomly.
Clusters produced vary from one run to another
2. The centroid is (typically) the mean of the points in the
cluster.
3. “Closeness” is measured by Euclidean distance, cosine
similarity, correlation, etc.
4. K-means will converge for common similarity measures
mentioned above.
5. Most of the convergence happens in the first few
iterations.
Often the stopping condition is changed to “Until relatively
few points change clusters”
K-MEANS CLUSTERING: EXAMPLE
Given: No. of clusters -2
Data points:
K-MEANS CLUSTERING: EXAMPLE
Solution:
Step 1. Initial Centroids: K1- (185,72) K2- (170,56)
K-MEANS CLUSTERING: EXAMPLE
Step 2 : Calculate Euclidean Distance of both the centroids
with each of the data point.
K-MEANS CLUSTERING: EXAMPLE
K-MEANS CLUSTERING: EXAMPLE
K-MEANS CLUSTERING: EXAMPLE
Step 3: Final cluster allocation
K-MEANS ADVANTAGES
Advantages
Relatively simple to implement.
Scales to large data sets.
Guarantees convergence.
Can warm-start the positions of centroids.
Easily adapts to new examples.
Generalizes to clusters of different shapes and
sizes, such as elliptical clusters.
K-MEANS DISADVANTAGE
Choosing k manually.
Being dependent on initial values.
For a low k, you can mitigate this dependence by running k-means
several times with different initial values and picking the best result.
Clustering data of varying sizes and density.
k-means has trouble clustering data where clusters are of varying sizes
and density.
Clustering outliers.
Centroids can be dragged by outliers, or outliers might get their own
cluster instead of being ignored. Consider removing or clipping outliers
before clustering.
Scaling with number of dimensions.
As the number of dimensions increases, a distance-based similarity
measure converges to a constant value between any given examples.
K-NN VS K-M EANS
F UZZY C LUSTERING
F UZZY C LUSTERING
C LUSTERING S CHEMAS
E XAMPLE : F UZZY C-M EANS
F UZZY C-M EANS A LGORITHM
FCM A DVANTAGES & D ISADVANTAGES
H IERARCHICAL C LUSTERING
H IERARCHICAL C LUSTERING
H IERARCHICAL C LUSTERING
E XAMPLE : A GGLOMERATIVE
C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
A GGLOMERATIVE C LUSTERING
H IERARCHICAL C LUSTERING
D ENSITY B ASED C LUSTERING
D ENSITY B ASED C LUSTERING
K-M EANS VS
D ENSITY B ASED C LUSTERING
D ENSITY B ASED C LUSTERING
E XAMPLE : DBSCAN
DBSCAN
DBSCAN
DBSCAN
DBSCAN: A DVANTAGES &
D ISADVANTAGES