0% found this document useful (0 votes)

12 views

L07 Clustering algorithms

Uploaded by

black hello

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

L07 Clustering algorithms

Uploaded by

black hello

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Unsupervised Learning Techniques

What is unsupervised
learning?
● an area of machine learning that deals with methods for analysing and
cluster datasets without explicit classifications.
● operates on unlabeled data, independently discovering underlying
patterns and insights without the need for human intervention (IBM, 2023)

Unsupervised
Learning

Gaussian Mixture
OPTICS Latent Dirichlet
Model
(GMM) Allocation (LDA)

Density-Based
Spatial Clustering of Hierarchical
K-means clustering
Applications with Clustering
Noise (DBScan)
Unsupervised Learning Techniques
I. K-Means Clustering - Popular for grouping based on similarities.
II. DBSCAN - Groups closely packed points, effective for various shapes and sizes.

III. Gaussian Mixture Model - Assumes data from finite Gaussian distributions.

IV. Meanshift clustering.

V. Spectral Clustering - Uses eigenvalues of the similarity matrix, effective for

non-globular clusters.
VI. Birch

VII. Affinity propagation

VIII. Hierarchical Clustering - Builds a hierarchy of clusters.

IX. Fuzzy C-means

X. Latent Dirichlet Allocation

XI. Latent Semantic Analysis

K- MEANS
CLUSTERING
DEFINITION
I n t r o d u c ti o n t o K - m e a n s :
A c l u s t e r i n g a l g o r i t h m u s e d in d a t a m i n i n g a n d p a tt e r n
r e c o g n i ti o n . D e v e l o p e d b y J . B. M a c Q u e e n in 1967 .
Purpose:
S e g m e n t s a d a t a s e t i nto K d i s ti n c t , n o n - o v e r l a p p i n g c l u s t e r s .
M i n i m i z e s v a r i a n c e w i t h i n e a c h c l u s t e r to e n s u r e h i g h s i m i l a r i t y a m o n g d a t a p o i n t s in
the s a m e c l u s t e r.
Key Parameter:
K: R e p r e s e n t s t h e n u m b e r of c l u s t e r s t h e a l g o r i t h m will fo r m .
K- MEANS
CLUSTERING
PROCESSES

1)D e t e r m i n e K n u m b e r of c l u s t e r s
2)R a n d o m l y c h o o s e k d a t a p o i n t s ( s e e d s ) to b e
t h e i n i ti a l centroids, cluster centers
3)A s s i g n e a c h d a t a p o i n t to t h e c l o s e s t
centroid 4 ) R e - compute the c e nt ro i d s
using the current
cluster memberships.
5 ) I f a c o n v e r g e n c e c r i t e r i o n is n o t met, g o to 3.
K- MEANS
CLUSTERING
PROCESSES

Start by deciding the Assign each data Recalculate the Continue the
number (k) of clusters point to the nearest centroids as the assignment and
to create. centroid, forming k center of the clusters. update steps until the
clusters. centroids do not
Initialize k centroids change significantly,
randomly. indicating that the
clusters are stable.
K- MEANS
CLUSTERING
EVALUATION METRICS
Elbow Method:
Plots e x p l a i n e d v a r i a ti o n a s a f u n c ti o n of t h e
number of c l u s t e r s .
I d e n ti fi e s the ' e l b o w p o i nt ' in t h e c u r v e a s t h e
o p ti m a l n u m b e r of clusters.
Adds c l u s t e r s unti l a d d i ti o n a l c l u s t e r s d o n o t
s i g n i fi c a n t l y i m p r o v e the model.
S i l h o u e t t e Coeffi cient:
Measures h o w s i m i l a r an o b j e c t is to i t s ow n
cluster co mpared to other clusters.
Calculated as:
(b−a)/ max(a, b) w h e re :
a: Mean i n t r a - c l u ste r d i sta n c e .
b: Mean n e a r e s t - c l u ste r d i sta n c e .
Ranges f ro m - 1 to 1 ,w h e r e h i g h e r v a l u e s i n d i c a t e
b e tt e r m a t c h e d objects within clusters.
K- MEANS
CLUSTERING
APPLICATIONS
Human Resource Information System:
Study on a K-means clustering algorithm based on the Spark platform.
Clusters employees by their characteristics for efficient human resources recommendations.
Enables personalized talent management and better understanding of employee behavior
and preferences.

Personality Trait Categorization:

Utilizes K-means clustering for grouping individuals.
Based on the Big Five personality traits: Extraversion, Agreeableness, Conscientiousness,
Neuroticism, and Openness to Experience.
Helps in categorizing personality traits to understand similarities among individuals.

Ref: https://shorturl.at/Q57uS
K - means Clustering
Pros:
Simple and easy to implement
can efficiently handle large datasets with many variables
and observations.

: Cons:
The user must specify the number of clusters (K) in
advance, which can be challenging when the optimal number
of clusters is unknown.
Outliers can significantly affect the centroids and
cluster assignments, potentially leading to inaccurate
clustering results.
DBSC
AN
Definati
on
Overview of DBSCAN:
Density-Based S p a ti a l C l u s t e r i n g of A p p l i c a ti o n s w i t h N o i s e ( D B S C A N ) is a p r o m i n e n t
clustering algorithm.
Notable for i d e n ti f y i n g c l u s t e r s of a r b i t r a r y s h a p e s a n d
s i z e s . E ff e c ti v e l y handles noise and
o u t l i e rs .
Operational Pa ra m e te r s:
Eps ( E p s i l o n ) : D e fi n e s t h e r a d i u s of t h e n e i g h b o r h o o d
around a point.
Min P t s ( M i n i m u m P o i n t s ) : S p e c i fi e s t h e m i n i m u m n u m b e r
of points r e q u i r e d to fo r m a
dense region.
Key Fe a t u r e s :
Does n o t r e q u i r e t h e n u m b e r of c l u s t e r s to b e s p e c i fi e d in
advance. Capable of d i s c o v e r i n g clusters with va r ie d
s h a p e s a n d d e n s i ti e s .
Points in s p a r s e a r e a s a re c l a s s i fi e d a s n o i s e .
DBSC
AN
Proce
ss
DBSCAN C l u s t e r i n g S t e p s :
I d e n ti fi c a ti o n of C o r e P o i n t s :
Determines i f each p o i n t h a s a m i n i m u m n u m b e r of n e i g h b o r s w i t h i n a g i v e n d i s t a n c e
(Eps). Core points a re i d e n ti fi e d a s t h e s t a r ti n g p o i n t s for
c l u s t e r f o r m a ti o n .
C l u s t e r Expansion:
Recursively c o n n e c t s all d i r e c t l y r e a c h a b l e p o i n t s f r o m e a c h c o r e
p o i n t . C o n ti n u e s to e x p a n d t h e cluster by
a g g r e g a ti n g all c o n n e c t e d points.
Handling Noise:
Points t h a t a re n o t r e a c h a b l e f r o m a n y c o r e p o i n t a re l a b e l e d a s
n o i s e . E ff e c ti v e l y s e p a ra t e s outliers from
main clusters.
I t e r a t i o n and R e s u l t :
I terates t h e p r o c e s s u nti l all p o i n t s a re e i t h e r a s s i g n e d to c l u s t e r s or l a b e l e d a s
noise. Results in d i s ti n c t clusters that c a p t u r e the
dense regions of the dataset.
Step of DBSCAN
In a given dataset, classify each point as a core point,
Core Points border point, or noise point, based on the number of points
within a given radius (ε).

Cluster Form a cluster starting from a core point, then

iteratively add all directly reachable points to the
Formation cluster.

Expansio Continue the process by adding all density-reachable

n points to the cluster.

Completio Repeat the process until all points are classified

n as either a part of a cluster or noise.

DBSC
AN
Evaluation
Metrics
D B S C A N e v a l u a ti o n f o c u s e s on c l u s t e r s t a b i l i t y a n d q u a l i t y m e t r i c s .
T h e a l g o r i t h m ' s r o b u s t n e s s in h a n d l i n g d i v e r s e d a t a d i s t r i b u ti o n s is h i g h l i g h t e d .
D B S C A N is e ff e c ti v e in m i ti g a ti n g t h e i m p a c t of n o i s e .
I t e x c e l s in d a t a s e t s w i t h c o m p l e x s t r u c t u r e s d u e to i t s a d a p ti v e n a t u r e .
DBSC
AN
Applicati
on
C u s t o m e r S e g m e n t a ti o n :
DBSCAN is u ti l i z e d for c u s t o m e r s e g m e n t a ti o n in o n l i n e reta i l .
It i d e n ti fi e s c l u s t e r s of c u s t o m e r s b a s e d on s h o p p i n g b e h a v i o r s a n d t ra i t s .
P a r ti c u l a r l y b e n e fi c i a l d u e to i t s a b i l i t y to h a n d l e d a t a w i t h v a r y i n g c l u s t e r s h a p e s a n d
s i z e s . C a p a b l e of i d e n ti f y i n g outliers or noise in the
dataset.
Clustering Hobbies and Interests:
DBSCAN is s u i t a b l e for c l u s t e r i n g d a t a r e l a t e d to h o b b i e s a n d i n t e r e s t s .
It i d e n ti fi e s g r o u p s w i t h s i m i l a r p r e f e r e n c e s , s u c h a s s p o r t s , m u s i c , or t e c h .
In s o c i a l n e t w o r k i n g s c e n a r i o s , D B S C A N c l u s t e r s u s e r s b a s e d on s h a r e d i n t e r e s t s .
By a d j u s ti n g e p s i l o n a n d m i n _ s a m p l e s p a r a m e t e r s , it e ff e c ti v e l y i d e n ti fi e s c l u s t e r s w i t h
f r e q u e n t i n t e r a c ti o n s a n d w e a ke r ti e s o u t s i d e the group.
GAUSSIAN
Definati
MIXTURE MODEL
on
GM M i s a p r o b a b i l i s ti c m o d e l r e p r e s e n ti n g n o r m a l l y d i s t r i b u t e d
s u b p o p u l a ti o n s w i t h i n a l a r g e r p o p u l a ti o n .
I t fa l l s u n d e r m o d e l - b a s e d c l u s t e r i n g .
U ti l i ze s m u l ti v a r i a t e n o r m a l d i s t r i b u ti o n s to m o d e l t h e d a t a .
E a c h s u b p o p u l a ti o n ( c o m p o n e n t of t h e m i x t u r e ) i s m o d e l e d b y a G a u s s i a n
d i s t r i b u ti o n w i t h i t s u n i q u e m e a n a n d c o v a r i a n c e .
GAUSSIAN
Proce
MIXTURE MODEL
ss
U ti l i z i n g G a u s s i a n M i x t u r e M o d e l ( G M M ) fo r C l u s t e r i n g :
Determine t h e n u m b e r of c l u s t e r s a n d i n i ti a l i ze p a r a m e t e r s
(means, covariances) for G a u s s i a n d i s t r i b u ti o n s .
E m p l o y E x p e c t a ti o n - M a x i m i z a ti o n ( E M ) a l g o r i t h m :
E x p e c t a ti o n S t e p : E s ti m a t e p r o b a b i l i ti e s of d a t a p o i n t s b e l o n g i n g to
each cluster based on c u r r e n t p a r a m e t e r guesses.
M a x i m i z a ti o n S t e p : U p d a t e p a r a m e t e r s to m a x i m i z e d a t a l i ke l i h o o d
given
these assignments.
I terate EM p r o c e s s u nti l c o n v e r g e n c e , i n d i c a ti n g m i n i m a l p a r a m e t e r
c h a n g e s . A s s i g n e a c h d a t a p o i n t to the cluster with the
highest p r o b a b i l i t y.
Step of GMM
Initialization Repeat
Start by assuming that
04 Alternate between the E-
the data is generated step and M-step until
from a mixture of several convergence, i.e., the
Gaussian distributions. parameters of the
Initialize the parameters Gaussian distributions no
of these distributions longer change
randomly.
01 03 significantly.

Expectation (E-step)
Maximization (M-step)
For each point, compute
02 Update the parameters
the probability that it
of the Gaussians to
belongs to each cluster
maximize the likelihood
(Gaussian distribution).
of the data points
GAUSSIAN
Evaluation
MIXTURE MODEL
Metrics
L o g - Likelihood:
Higher values s u g g e s t b e tt e r d a t a f it b y t h e m o d e l .
B a y e s i a n I n f o r m a ti o n C r i t e r i o n ( B I C ) a n d A k a i k e I n f o r m a ti o n C r i t e r i o n ( A I C ) :
Lower v a l u e s i n d i c a t e s u p e r i o r m o d e l f it w h i l e c o n s i d e r i n g m o d e l
c o m p l e x i t y.
C l a s s i fi c a ti o n A c c u r a c y :
Measures a c c u r a c y of c l u s t e r a s s i g n m e n t s i f t r u e l a b e l s a re a v a i l a b l e .
F l e x i b i l i t y a n d C a p a b i l i ti e s of GMM:
Flexible c l u s t e r s h a p e s b a s e d on G a u s s i a n d i s t r i b u ti o n s .
Capable of i d e n ti f y i n g c o m p l e x d a t a s t r u c t u r e s w i t h a p p r o p r i a t e
cluster numberand i n i ti a l i za ti o n .
GAUSSIAN
Applicati
MIXTURE MODEL
on
GMM in Speech E m o t i o n
U ti l i ze d
Recognition:
to e x t r a c t e m o ti o n a l s t a t e s f r o m s p e e c h s i g n a l d a t a s e t s .
Aims for h i g h a c c u r a c y in d e t e c ti n g e m o ti o n s l i ke a n g e r, c a l m n e s s ,
fear, h a p p i n e s s , a n d sadness.
S i g n i fi c a n t i m p l i c a ti o n s for i m p r o v i n g h u m a n - m a c h i n e i n t e r a c ti o n s .
Enhances a p p l i c a ti o n s in h e a l t h c a r e , e d u c a ti o n , m a r ke ti n g , a n d
a d v e r ti s i n g .
GMM in P e r s o n a l i t y Tr a i t s and Physiological
Res po ns es :
Employed to a u t o m a ti c a l l y c l u s t e r i n d i v i d u a l s b a s e d on p e r s o n a l i t y t r a i t s
a n d e l e c t r o c a r d i o g r a m r e s p o n s e s d u r i n g s t r e s s r e c o v e r y.
Revealed a s s o c i a ti o n s b e t w e e n p e r s o n a l i t y t r a i t s (e. g., n e u r o ti c i s m ,
extraversion) a n d p h y s i o l o g i c a l r e s p o n s e s (e. g., e l e c t r o c a r d i o g r a m ,
s a l i v a r y c o r ti s o l ) .
Highlights t h e u ti l i t y of GM M in u n d e r s t a n d i n g t h e r e l a ti o n s h i p b e t w e e n
personality a n d p h y s i o l o g i c a l s t r e s s m a n i f e s t a ti o n s .
Gaussian Mixture Models
Cons:
Pros:
Training GMMs involves estimating parameters
can capture complex data distributions such as means, covariances, and mixture
by modeling them as a combination of weights, which can be computationally
multiple Gaussian distributions. expensive, especially for high-dimensional data
or large datasets.
Soft Clustering : data points are
assigned probabilities of belonging to Prone to overfitting, requiring
each cluster careful regularization
Accommodates Different Cluster Shapes
and Sizes GMMs' performance can be sensitive to the
initialization of parameters, leads to
suboptimal solutions or convergence to local
optima.
MEANSHIFT
CLUSTERING
Definati
on
N o n - p a r a m e t r i c c l u s t e r i n g t e c h n i q u e for d a t a a n a l y s i s a n d i m a g e p r o c e s s i n g .
I d e n ti fi e s d e n s e a r e a s of d a t a p o i n t s i t e r a ti v e l y.
S h i ft s e a c h p o i n t t o w a r d s t h e d e n s e s t a re a in i t s v i c i n i t y.
I t e r a ti v e l y u p d a t e s p o i n t l o c a ti o n s b y c o m p u ti n g m e a n s w i t h i n a s p e c i fi e d r e g i o n ( b a n d w i d t h or
radius).
C o n ti n u e s s h i ft i n g unti l c o n v e r g e n c e , w h e r e p o i n t s c e a s e s i g n i fi c a n t m o v e m e n t , d e fi n i n g c l u s t e r
centers..
MEANSHIFT
CLUSTERING
Proce
ss
S t e p s of Mean Shift Clustering:
Initialization:
Set a b a n d w i d t h p a r a m e t e r to d e t e r m i n e n e i g h b o r h o o d s i ze .
Mean Computation:
Fo r e a c h d a t a p o i nt , c o m p u t e t h e m e a n w i t h i n i t s
n e i g h b o r h o o d . S h i ft t h e p o i n t to this mean.
Iteration:
Repeat t h e p r o c e s s i t e r a ti v e l y unti l c o n v e r g e n c e .
Convergence o c c u r s w h e n p o i n t s s t o p m o v i n g s i g n i fi c a n t l y
or meet a p r e d e fi n e d t h r e s h o l d .
C l u s t e r Formation:
Form c l u s t e r s b a s e d on t h e p r o x i m i t y of p o i n t s to e a c h
other a ft e r convergence.
MEANSHIFT
CLUSTERING
Evaluation
Metrics
Convergence Time:
R e fl e c t s h ow s w i ft l y t h e a l g o r i t h m c o n v e r g e s .
Depends on i n i ti a l d a t a d i s t r i b u ti o n a n d c h o s e n b a n d w i d t h .
Cluster Cohesion and Separation:
Evaluates e ff e c ti v e n e s s in f o r m i n g d i s ti n c t a n d c o h e r e n t
clusters. A s s e s s e d using metrics l ike s i l h o u e tt e
score.
R o b u s t n e s s to Noise:
Mean S h i ft is r o b u s t to n o i s e a n d o u t l i e rs .
N a t u ra l l y g rav i tate s towa rd s h i g h - d e n s i t y regions, largely
ignoring sparse o u t l i e rs .
MEANSHIFT
CLUSTERING
Applicati
on
Online P e r s o n a l i t y Tr a i t s Mining:
M e a n S h i ft C l u s t e r i n g a p p l i e d for c o n s t r u c ti n g a 1 4 - c l u s t e r p e r s o n a l i t y t r a i t s
m o d e l . U ti l i ze s online user text features and behavioral
c h a r a c t e r i s ti c s .
O ff e r s a s c a l a b l e a n d o b j e c ti v e m e t h o d for m i n i n g p e r s o n a l i t y t r a i t s
on l in e . A p p l i c a b l e in v a r i o u s domains, including online
learning.
Social Media Analysis:
M e a n S h i ft C l u s t e r i n g u s e d for c o n t e n t c l u s t e r i n g a n d c l a s s i fi c a ti o n in s o c i a l m e d i a
analysis. C an be e x t e n d e d to u n d e r s t a n d personality traits by analyzing
social media posts.
Clusters b a s e d on l a n g u a g e , s e n ti m e n t , or b e h a v i o r.
Enables i d e n ti fi c a ti o n of p a tt e r n s a n d r e l a ti o n s h i p s b e t w e e n p e r s o n a l i t y t r a i t s a n d
online behaviors.
Facilitates m o r e e ff e c ti v e m a r ke ti n g , s o c i a l i n fl u e n c e , a n d m e n t a l h e a l t h m o n i t o r i n g .
SPECTRAL
Definati
CLUSTERING
on
Te c h n i q u e u ti l i z i n g e i g e n v a l u e s of t h e s i m i l a r i t y m a t r i x for d i m e n s i o n a l i t y r e d u c ti o n b e f o r e
clustering.
Tr e a t s c l u s t e r i n g a s a g r a p h - p a r ti ti o n i n g p r o b l e m .
E ff e c ti v e in i d e n ti f y i n g n o n - g l o b u l a r c l u s t e r s .
C a p a b l e of d i s c o v e r i n g c l u s t e r s wi th c o m p l e x
shapes.
SPECTRAL
Proce
CLUSTERING
ss
S t e p s of S p e c t r a l Clustering:
R e p r e s e n t d a t a p o i n t s a s n o d e s in a g ra p h .
E s t a b l i s h e d g e w e i g h t s b a s e d on s i m i l a r i t y b e t w e e n n o d e s .
C o n s t r u c t an a d j a c e n c y matrix r e fl e c ti n g t h e s e
weights.
Fo r m u l ate a L a p l a c i a n matrix from the a d j a c e n c y matrix.
P e r f o r m e i g e n v a l u e d e c o m p o s i ti o n of t h e L a p l a c i a n m a t r i x .
O b t a i n e i g e n v e c t o r s d e fi n i n g a r e d u c e d s p a c e .
C l u s t e r d a t a in t h i s r e d u c e d s p a c e u s i n g a c o n v e n ti o n a l
algorithm l ike k - means.
Step of Spectral Clustering
Similarity Graph
Build a similarity graph among
1 all data points, typically using a
measure like Gaussian (radial
Laplacian Matrix basis function) similarity.
2
Compute the Laplacian matrix
Eigenvalue Decomposition
from the similarity graph.
3 Compute the eigenvalues and
K-means on Eigenvectors eigenvectors of the Laplacian
matrix.
Use the eigenvectors
4
corresponding to the k smallest
non-zero eigenvalues to embed
the data points into a lower-
dimensional space, and then
apply K-means clustering to
cluster these points.
SPECTRAL
CLUSTERING
Evaluation
Metrics
N o r m a l i z e d M u t u a l I n f o r m a ti o n ( N M I ) :
Evaluates c l u s t e r i n g q u a l i t y b y c o m p a r i n g p r e d i c t e d c l u s t e r s to g r o u n d
truth. A c c o u n t s for p o t e n ti a l i n f o r m a ti o n ga i n or
loss through clustering.
Running Time:
Measures a l g o r i t h m e ffi c i e n c y, p a r ti c u l a r l y i m p o r t a n t d u e to
c o m p u t a ti o n a l i n t e n s i t y.
Eigenvalue d e c o m p o s i ti o n a n d s i m i l a r i t y m a t r i x c o n s t r u c ti o n c a n b e
r e s o u r c e - intensive.
• A ffi n i t y M a t r i x ( A ) , D e g r e e M a t r i x ( D ) , G r a p h L a p l a c i a n M a t r i x ( L )
SPECTRAL
Applicati
CLUSTERING
on
Personality Tr a i t s a n d B o d y M a s s I n d e x ( B M I ) :
U ti l i ze d S p e c t r a l C l u s t e r i n g to i d e n ti f y p e r s o n a l i t y t ra i t c l u s t e r s a s s o c i a t e d w i th BMI.
Revealed 14 t ra i t c l u s t e r s d e m o n s t r a ti n g w e l l - e s t a b l i s h e d a s s o c i a ti o n s
b e t we e n p e rs o n a l i t y traits and BMI.
Personality Ty p e s Revisited:
Algorithmic a p p r o a c h a p p l i e d to B i g F i v e t r a i t s d a t a s e t .
Resulted in a f i v e - c l u s t e r s o l u ti o n : re s i l i e n t , o v e r c o n t r o l l e r, u n d e r c o n t r o l l e r, r e s e r v e d ,
a n d v u l n e r a b l e - re s i l i e n t .
Provides i n s i g h t s i nto v a r i o u s p e r s o n a l i t y p r o t o t y p e s b a s e d on B i g F i v e t ra i t s .
Spectral Clustering
Pros:
capture complex cluster
structures handle non-linear
decision boundaries
work well for data with irregular shapes or clusters of
varying densities

Cons:
sensitivity to parameter choices
computational complexity
scalability issues for large
datasets
BIRCH
● Aka “Balanced Iterative Reducing and Clustering using Hierarchies”
● Handles large datasets
a. Creates a condensed summary of the dataset
b. Clusters the summary.
● 4 Phases:
a. Loading
b. Optional Condensing
c. Global Clustering
d. Optional Refining
Affinity Propagation
Introduction Key Steps
● Clusters are automatically 1. Similarity Calculation
identified without knowing 2. Responsibility Calculation
the data point location. 3. Availability Calculation
● Utilizes message passing for 4. Iterative Update
each data point 5. Net Responsibility Calculation
6. Exemplar Selection
Matrices 7. Cluster Assignment
A. Similarity Matrix (S)
B. Responsibility Matrix (R)
C. Availability Matrix (A)
Hierarchical Clustering
Hierarchical clustering builds a tree-like hierarchy of clusters by
recursively merging or splitting clusters based on their similarities or
dissimilarities.

It can be agglomerative (bottom-up) or divisive (top-down).

Hierarchical Clustering
Key Steps:

1.Initialisation
2.Distance Matrix
Calculation
3.Merge Clusters
4. Update Distance Matrix
5.Repeat Steps 3-4
6. Dendrogram Construction
7. Cluster Selection
Hierarchical Clustering
Pros:
Produces dendrogram trees that visually represent the clustering process, making
it easier to interpret and understand the relationships between clusters.
No need to specify the number of clusters
It can identify nested clusters, which is useful when the data has a
hierarchical structure or when there are meaningful subgroups within larger
clusters.

Cons:
computationally expensive, especially for large datasets, as the algorithm's
time complexity is O(n2logn)
Sensitivity to noise and outliers
Once the clustering process is completed, it's challenging to modify the
hierarchy without rerunning the algorithm from scratch.
F uz z y C - means ( F C M)
is a variant of the traditional K-means clustering algorithm

is soft clustering, assigns a membership degree between 0 and 1 for each data
point for each cluster. (K-means is hard clustering, which assign each data point
to a single cluster)

can be used to cluster multidimensional data

F uz z y C - means ( F C M)
Key Steps:

1.Initialisation
2.Membership Degree
Calculation
3.Cluster Center Update
4. Convergence Check
5.Iteration
6. Result Interpretation
F uz z y C - means ( F C M)
Pros :
Cons :
Ability to handle overlapping clusters sensitive to the initial selection of
FCM allows data points to belong to cluster centroids
multiple clusters simultaneously, providing Difficulty in determining the number
more flexibility in representing complex of clusters
data patterns Speed : FCM slower than K-means as
can handle noisy data and outliers each point is evaluated with each
better than traditional hard clustering cluster, and more operations are
algorithms involved in each evaluation. But k-means
only calculate the distance
Latent Dirichlet Allocation (LDA)

How does LDA Application in

works? Published Studies

Identify Degree of Topics in

Document Topic
Topic Topic
+ Modelling
Modelling Modelling
Identify Terms associated On
On News On Movie
with Topics Customer
Articles Reviews
Satisfaction

● Kretinin, M., & Nguyen, G. (2022). Topic Modeling on News Articles using Latent Dirichlet Allocation. In 2022 IEEE 26th International Conference on Intelligent Engineering Systems (INES). 2022 IEEE 26th International
Conference on Intelligent Engineering Systems (INES). IEEE. https://doi.org/10.1109/ines56734.2022.9922609
● Zhao, L., Zhao, Q., & Wang, Y. (2020). Research on Chinese Movie Reviews Based on Latent Dirichlet Allocation Topic Model. In 2020 2nd International Conference on Machine Learning, Big Data and Business
Intelligence (MLBDBI). 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). IEEE. https://doi.org/10.1109/mlbdbi51377.2020.00016
● Karmakar, S., Sivakumar, N., & Pillai, A. S. (2023). Exploring Satisfaction Level of Customers in Restaurants by Using Latent Dirichlet Allocation(LDA) Algorithm. In 2023 International Conference on Inventive
Computation Technologies (ICICT). 2023 International Conference on Inventive Computation Technologies (ICICT). IEEE. https://doi.org/10.1109/icict57646.2023.10134169
Latent Semantic Analysis (LSA)

How does LSA Application in

works? Published Studies

Representation in Semantic
Space (Matrix)
+
Dimensionality Reduction Topic Modelling On Topic Modelling On
(SVD) User Reviews in E- Perception towards
+ Commerce Platform Government
Clustering

● Chehal, D., Gupta, P., & Gulati, P. (2020). RETRACTED ARTICLE: Implementation and comparison of topic modeling techniques based on user reviews in e-commerce recommendations. In
Journal of Ambient Intelligence and Humanized Computing (Vol. 12, Issue 5, pp. 5055–5070). Springer Science and Business Media LLC. https://doi.org/10.1007/s12652-020-01956-6
● Qomariyah, S., Iriawan, N., & Fithriasari, K. (2019). Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis. In AIP Conference Proceedings. THE 2ND
INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION. AIP Publishing. https://doi.org/10.1063/1.5139825a

NTA UGC NET/JRF/SET Library And Information Science Objective Previous Year Questions with Details Solution of MCQ Option
From Everand
NTA UGC NET/JRF/SET Library And Information Science Objective Previous Year Questions with Details Solution of MCQ Option
Sachin Prabhakar Masane
No ratings yet
ICT Theory Revision Booklet, Edexcel IGCSE
100% (3)
ICT Theory Revision Booklet, Edexcel IGCSE
186 pages
Ivalua - Sourcing Management
No ratings yet
Ivalua - Sourcing Management
41 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
EML %th Module
No ratings yet
EML %th Module
40 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
ML - 8
No ratings yet
ML - 8
70 pages
DWM PT 2 QB Soln
No ratings yet
DWM PT 2 QB Soln
8 pages
UNIT-6 DBSCAN Clustering
No ratings yet
UNIT-6 DBSCAN Clustering
6 pages
Clustering Analysis (1)
No ratings yet
Clustering Analysis (1)
12 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
DB Scan
No ratings yet
DB Scan
7 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Image Clustering: Prof. Dr. Rafiqul Islam Department of CSE
No ratings yet
Image Clustering: Prof. Dr. Rafiqul Islam Department of CSE
26 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
AI
No ratings yet
AI
19 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
K Means
No ratings yet
K Means
9 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
M5
No ratings yet
M5
40 pages
Unit 5
No ratings yet
Unit 5
63 pages
DBSCAN AND OPTICS
No ratings yet
DBSCAN AND OPTICS
28 pages
ML_lecture14
No ratings yet
ML_lecture14
17 pages
Spatial Data Mining: Clustering Techniques
No ratings yet
Spatial Data Mining: Clustering Techniques
56 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
K Mean
No ratings yet
K Mean
7 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
M5
No ratings yet
M5
40 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Clustering
No ratings yet
Clustering
28 pages
PART2
No ratings yet
PART2
61 pages
Unit-5
No ratings yet
Unit-5
33 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Clustering
No ratings yet
Clustering
65 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
CC - Unit IV - Chapters
No ratings yet
CC - Unit IV - Chapters
47 pages
4 Clustring
No ratings yet
4 Clustring
48 pages
DBSCAN
No ratings yet
DBSCAN
23 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Clustering: ISOM3360 Data Mining For Business Analytics
No ratings yet
Clustering: ISOM3360 Data Mining For Business Analytics
28 pages
Clustering
No ratings yet
Clustering
24 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
DSS09 (B) - Clustering
No ratings yet
DSS09 (B) - Clustering
35 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
State Space Search: Fundamentals and Applications
From Everand
State Space Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chap01 - Intro to Programming
No ratings yet
Chap01 - Intro to Programming
37 pages
Chapter 6 Network Layer_July 2023
No ratings yet
Chapter 6 Network Layer_July 2023
58 pages
Guide to Install Visual Studio 2019
No ratings yet
Guide to Install Visual Studio 2019
3 pages
Chapter 6 - Multimedia Element Video
No ratings yet
Chapter 6 - Multimedia Element Video
44 pages
Chapter 2 Network Protocols _ Communication_July 2023
No ratings yet
Chapter 2 Network Protocols _ Communication_July 2023
56 pages
Chapter 4 Data Link Layer (OSI Model)_July 2023
No ratings yet
Chapter 4 Data Link Layer (OSI Model)_July 2023
39 pages
Chapter 10 Application Layer_July 2023
No ratings yet
Chapter 10 Application Layer_July 2023
36 pages
Practical 1 Slide
No ratings yet
Practical 1 Slide
20 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
L08 Hierachical agglomerative clustering
No ratings yet
L08 Hierachical agglomerative clustering
41 pages
Setup - Firebase
No ratings yet
Setup - Firebase
9 pages
L03 Generalization, Train Test Splits and Validation
No ratings yet
L03 Generalization, Train Test Splits and Validation
49 pages
Practical 2 Hadoop Distributed File System (HDFS)
No ratings yet
Practical 2 Hadoop Distributed File System (HDFS)
4 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
L02 Classification and Regression
No ratings yet
L02 Classification and Regression
26 pages
L05 Unsupervised learning - Overview
No ratings yet
L05 Unsupervised learning - Overview
16 pages
L01 Introduction to ML
No ratings yet
L01 Introduction to ML
16 pages
CNAS Strategic Competition in An Era of AI July 2018 - v2
No ratings yet
CNAS Strategic Competition in An Era of AI July 2018 - v2
27 pages
Chapter 5 Normalization
No ratings yet
Chapter 5 Normalization
12 pages
Experiment No1 SPCC
No ratings yet
Experiment No1 SPCC
8 pages
Module 44: Understanding Interactive and Long-Running Flows
No ratings yet
Module 44: Understanding Interactive and Long-Running Flows
17 pages
Medico-The Online Doctor
100% (1)
Medico-The Online Doctor
4 pages
Data Analyst With A Proven Track Record With Certifications in Data Analytics
No ratings yet
Data Analyst With A Proven Track Record With Certifications in Data Analytics
2 pages
Deploy Roaming User Profiles
No ratings yet
Deploy Roaming User Profiles
14 pages
Chapter 1 Computer System Overview
No ratings yet
Chapter 1 Computer System Overview
420 pages
Office Automation
No ratings yet
Office Automation
12 pages
Resume Quang Diem Pham
No ratings yet
Resume Quang Diem Pham
3 pages
EN2000 Product Data Sheet: EN2810 (EN2.0)
No ratings yet
EN2000 Product Data Sheet: EN2810 (EN2.0)
2 pages
How To Check Log Cyberark
No ratings yet
How To Check Log Cyberark
43 pages
Lab 4
No ratings yet
Lab 4
7 pages
New Year's Eve 2014 - Google Doodles Wiki - Fandom
No ratings yet
New Year's Eve 2014 - Google Doodles Wiki - Fandom
3 pages
Linux Shell or "Terminal"
No ratings yet
Linux Shell or "Terminal"
3 pages
TREASURE HUNT SOLUTION
No ratings yet
TREASURE HUNT SOLUTION
3 pages
Product Description For SDH: Optix Imanager T2000 V200R007C03
No ratings yet
Product Description For SDH: Optix Imanager T2000 V200R007C03
128 pages
Microsoft 70-743 Dumps With Valid 70-743 Exam Questions PDF (2018)
No ratings yet
Microsoft 70-743 Dumps With Valid 70-743 Exam Questions PDF (2018)
19 pages
SpectraLayers Pro 10 Operation Manual en
No ratings yet
SpectraLayers Pro 10 Operation Manual en
104 pages
An Interval Power Flow Method Based On Linearized DistFlow Equations For Radial Distribution Systems
No ratings yet
An Interval Power Flow Method Based On Linearized DistFlow Equations For Radial Distribution Systems
5 pages
Kosp User Manual
No ratings yet
Kosp User Manual
6 pages
Cobra365 Ing Stad
No ratings yet
Cobra365 Ing Stad
2 pages
Immediate Download (Ebook PDF) Concepts of Database Management, 9th Edition Joy L. Starks Ebooks 2024
100% (1)
Immediate Download (Ebook PDF) Concepts of Database Management, 9th Edition Joy L. Starks Ebooks 2024
54 pages
AZ-900 Prepaway Premium Exam 222q
No ratings yet
AZ-900 Prepaway Premium Exam 222q
183 pages
Proposal For Mobile E
No ratings yet
Proposal For Mobile E
3 pages
Family: Technology Vocabulary
No ratings yet
Family: Technology Vocabulary
2 pages
Cis 232
No ratings yet
Cis 232
5 pages
Shiva Nagendra CV-OBIA
No ratings yet
Shiva Nagendra CV-OBIA
4 pages