Techniques of Cluster Analysis: A Seminar On
Techniques of Cluster Analysis: A Seminar On
Techniques of Cluster
Analysis
Group members
Munner mohammad 47
Vaibhav Nanaware 52
Nishant Nirmal 55
Tejas pawar 60
Bhagwat Shinde 72
Talal Saeed 48
4. Working of Clustering?
6. Clustering Algorithms
7. Conclusion
8. References.
2
What is Clustering?
Cluster Variate
- represents a mathematical representation of the
selected set of variables which compares the
object’s similarities.
3
Cluster Analysis in Marketing Research
Marketing Segmentation 4
Use of cluster analysis in marketing
Data Reduction
Hypothesis generation
5
Source: use of clustering
How does a cluster analysis work?
The primary objective of cluster analysis is to
define the structure of the data by placing the
most similar observations into groups.
6
Deriving Clustring
Source:clustring
7
Hierarchical Clustering Analysis
8
Hierarchical Clustering Analysis -continued
2. Divisive Clustering:
• Also known as top-down approach.
• algorithm also does not require to prespecify the number of
clusters.
• Top-down clustering requires a method for splitting a cluster that
contains the whole data and proceeds by splitting clusters
recursively until individual data have been splitted into singleton
cluster. 9
Agglomerative Algorithm
10
Source: Agglomerative image
Divisive Algorithm
The Divisive Hierarchical clustering is precisely the opposite of
the Agglomerative Hierarchical clustering.
Source:divisive image
11
Non hierarchical clustering
❖ Clustering involves formation of new clusters by merging or
splitting the clusters.
Refer.1]
● K-means
● Density-based
12
Source :https://new.pharmacelera.com/science/clustering-methods
K-means
K-Means algorithm consists of four basic steps: -
1) Determination of centers.
13
K-Means -conti’d
❖ K-Means Algorithms
Source : https://www.gatevidyalay.com/k-means-clustering-algorithm-example/
14
K-Medoids Clustering
15
K-Medoids Clustering - conti’d
Algorithm
1. Initialize: select k random points out of the n data
points as the medoids.
Source :
graphical-representation
17
Continued
Step 1:
Let the randomly selected 2 medoids, so select k = 2 and let
C1 -(4, 5) and C2 -(8, 5) are the two medoids.
Step 2: Calculating cost.
The points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to
cluster C2.
The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20
18
Source: Dissimilarity
Continued
Step 3:
Each point is assigned to that cluster whose dissimilarity is
less. So, the points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go
to cluster C2.
The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0
Source: Dissimilarity 19
As the swap cost is not less than zero, we undo the swap.
Hence (3, 4) and (7, 4) are the final medoids.
The clustering would be in the following way
20
Density Based Clustering
Algorithmic steps for DBSCAN clustering
Let X = {x1, x2, x3, ..., xn} be the set of data points. DBSCAN requires
two parameters: ε (eps) and the minimum number of points required to
form a cluster (minPts).
Step 1.
Start with an arbitrary starting point that has
not been visited.
Step 2.
Extract the neighborhood of this point using ε
Step 3.
If there are sufficient neighborhood around this point
then clustering process starts and point is marked as
visited else this point is labeled as noise.
21
Continued
Step 4.
Step 5.
Step 6.
22
Conclusion
Clustering is one of the important methods for data mining
applications.
23
References
[1] (Gulagiz F.K and Sahin S. (2017) Comparison of Hierarchical and Non Hierarchical Clustering
Algorithms, International Journal of Computer Engineering and Information Technology
January 2017, 6-14 (available online))
[2] Alpaydın, E., Zeki Veri Madenciliği: Ham Veriden Altın Bilgiye Ulaşma Yöntemler, Bilişim 2000,
Veri madenciliği Eğitim Semineri, 2000.
[3] Likas, A., Vlassisb, N., Verbeekb, J. J., The Global K-Means Clustering Algorithm, Pattern
Recognition, 2003, 36(2), pp 451-461.
[5] Density-based clustering algorithms – DBSCAN and SNN by Adriano Moreira, Maribel Y. Santos
and Sofia Carneiro.
[6] Kaufman, L., Rousseeuw, P. J., Clustering by Means of Medoids, Statistical Data Analysis
Based on The L1– Norm and Related Methods, Springer, 1987.
24
THANK YOU
25