0% found this document useful (0 votes)
60 views25 pages

Techniques of Cluster Analysis: A Seminar On

Cluster analysis techniques are used to group similar objects together. There are hierarchical and non-hierarchical clustering methods. Hierarchical clustering creates nested clusters organized as a tree structure using either agglomerative or divisive approaches. Non-hierarchical methods like k-means and k-medoids clustering partition objects into a predefined number of clusters by optimizing cluster centers or medoids. Clustering algorithms are widely applied in marketing research for tasks like market segmentation and understanding customer behavior.

Uploaded by

VAIBHAV NANAWARE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views25 pages

Techniques of Cluster Analysis: A Seminar On

Cluster analysis techniques are used to group similar objects together. There are hierarchical and non-hierarchical clustering methods. Hierarchical clustering creates nested clusters organized as a tree structure using either agglomerative or divisive approaches. Non-hierarchical methods like k-means and k-medoids clustering partition objects into a predefined number of clusters by optimizing cluster centers or medoids. Clustering algorithms are widely applied in marketing research for tasks like market segmentation and understanding customer behavior.

Uploaded by

VAIBHAV NANAWARE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

A seminar on

Techniques of Cluster
Analysis
Group members

Munner mohammad 47
Vaibhav Nanaware 52
Nishant Nirmal 55
Tejas pawar 60
Bhagwat Shinde 72
Talal Saeed 48

Under the Guidance of


Dr.PRIYADARSHAN DHABE
DEPTARTMENT. OF IT & MCA,VIT PUNE
Content
1. What is Clustering?

2. Cluster Analysis in Marketing Research.

3. Use of Cluster Analysis In Marketing.

4. Working of Clustering?

5. Different types of Cluster Analysis Technique.

6. Clustering Algorithms

7. Conclusion

8. References.

2
What is Clustering?

 Clustering analysis is a group of multivariate techniques


whose primary purpose is to group objects

 Cluster Variate
- represents a mathematical representation of the
selected set of variables which compares the
object’s similarities.

SOURCE :- Cluster Analysis

3
Cluster Analysis in Marketing Research

 Grouping similar customers and products is a fundamental


marketing concept. It is used ,for example, in market
segmentation.

 As companies connect with all there customers, they have


to divide the market into groups of consumers, customers or
clients with similar needs and wants.

Marketing Segmentation 4
Use of cluster analysis in marketing
 Data Reduction

 Potential opportunities for products

 Understanding of consumer behavior in market

 Hypothesis generation

5
Source: use of clustering
How does a cluster analysis work?
 The primary objective of cluster analysis is to
define the structure of the data by placing the
most similar observations into groups.

 To accomplish this task we must address three


basic questions:

 How do we measure similarity?

 How do we form clusters?

 How many clusters do we form?

6
Deriving Clustring

 There are number of methods available to carry


out clustering.
 They can classified as below

• Hierarchical Clustering Analysis

• Non-Hierarchical Clustering Analysis

Source:clustring
7
Hierarchical Clustering Analysis

 A Hierarchical clustering method works via grouping


data into a tree of clusters. Hierarchical clustering
begins by treating every data points as a separate
cluster.
 Then, it repeatedly executes the subsequent steps:
.

1) Identify the 2 clusters which can be closest


together
2) Merge the 2 maximum comparable clusters.

 We need to continue these steps until all the


clusters are merged together.

8
Hierarchical Clustering Analysis -continued

 In Hierarchical Clustering, the aim is to produce a hierarchical


series of nested clusters.
 The basic method to generate hierarchical clustering are
1. Agglomerative Clustering: 
• Also known as bottom-up approach or hierarchical agglomerative
clustering (HAC).
• A structure that is more informative than the unstructured set of
clusters returned by flat clustering.
• This clustering algorithm does not require us to prespecify the
number of clusters.

2. Divisive Clustering:
• Also known as top-down approach.
• algorithm also does not require to prespecify the number of
clusters.
• Top-down clustering requires a method for splitting a cluster that
contains the whole data and proceeds by splitting clusters
recursively until individual data have been splitted into singleton
cluster. 9
Agglomerative Algorithm

 Algorithm for Agglomerative Hierarchical Clustering is:


• Calculate the similarity of one cluster with all the other clusters
(calculate proximity matrix)
• Consider every data point as a individual cluster
• Merge the clusters which are highly similar or close to each other.
• Recalculate the proximity matrix for each cluster
• Repeat Step 3 and 4 until only a single cluster remains.

10
Source: Agglomerative image
Divisive Algorithm
 The Divisive Hierarchical clustering is precisely the opposite of
the Agglomerative Hierarchical clustering.

Source:divisive image

11
Non hierarchical clustering
❖ Clustering involves formation of new clusters by merging or
splitting the clusters.
Refer.1]

❖ It does not follow a tree like structure..

❖ Non hierarchical clustering methods

● K-means

● Density-based

12
Source :https://new.pharmacelera.com/science/clustering-methods
K-means
 K-Means algorithm consists of four basic steps: -

1) Determination of centers.

2) Assigning points to clusters which are outside of the centers


according to distance between centers and points.

3) Calculation of new centers.

4) Repeating these steps until obtaining decided clusters.

13
K-Means -conti’d

❖ K-Means Algorithms

✔ Assign initial values for means point {u1 , u2 , u3 ,., uk}


✔ Repeat:

• Assign each item xi to the clusters which has closest mean


• Calculate new mean for each cluster:

Source : https://www.gatevidyalay.com/k-means-clustering-algorithm-example/
14
K-Medoids Clustering

  A medoid can be defined as the point in the cluster, whose


dissimilarities with all the other points in the cluster is
minimum.
 The dissimilarity of the medoid(Ci) and object(Pi) is
calculated by using E = |Pi - Ci|

 The cost in K-Medoids algorithm is given as

  

15
K-Medoids Clustering - conti’d

Algorithm
1. Initialize: select k random points out of the n data
points as the medoids.

2. Associate each data point to the closest medoid by


using any common distance metric methods.

3. While the cost decreases:


        For each medoid m, for each data o point
which is not a medoid:
                1. Swap m and o, associate each data
point to the closest medoid, recompute the cost.
                2. If the total cost is more than that in the
previous step, undo the swap.
16
Example for K-medoids Clustering

 Lets Consider below data set for understanding how


K-medoids clustering works

Source :
graphical-representation

Source: data-set table

17
Continued
 Step 1:
Let the randomly selected 2 medoids, so select k = 2 and let
C1 -(4, 5) and C2 -(8, 5) are the two medoids.
 Step 2: Calculating cost.
 The points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to
cluster C2.
 The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20

18
Source: Dissimilarity
Continued
 Step 3:
 Each point is assigned to that cluster whose dissimilarity is
less. So, the points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go
to cluster C2.
 The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0

Source: Dissimilarity 19
 As the swap cost is not less than zero, we undo the swap.
Hence (3, 4) and (7, 4) are the final medoids.
 The clustering would be in the following way

20
Density Based Clustering
Algorithmic steps for DBSCAN clustering
Let X = {x1, x2, x3, ..., xn} be the set of data points. DBSCAN requires
two parameters: ε (eps) and the minimum number of points required to
form a cluster (minPts).

 Step 1.
Start with an arbitrary starting point that has
not been visited.

 Step 2.
Extract the neighborhood of this point using ε

 Step 3.
If there are sufficient neighborhood around this point
then clustering process starts and point is marked as
visited else this point is labeled as noise.
21
Continued
 Step 4.

If a point is found to be a part of the cluster then its ε


neighborhood is also the part of the cluster and the above
procedure from step 2 is repeated for all ε neighborhood
points. This is repeated until all points in the cluster is
determined.

 Step 5.

A new unvisited point is retrieved and processed, leading to


the discovery of a further cluster or noise.

 Step 6.

This process continues until all points are marked as visited.

22
Conclusion
 Clustering is one of the important methods for data mining
applications.

 We have seen various algorithm which r used for


clustering like DBSCAN, Agglomerative but the most
widely used algorithm is K-means.

 Clustering helps in understanding the natural


grouping in a dataset.

 The quality of clustering depends on the both the


similarity measure used by the method and its
implementation.

23
References
[1] (Gulagiz F.K and Sahin S. (2017) Comparison of Hierarchical and Non Hierarchical Clustering
Algorithms, International Journal of Computer Engineering and Information Technology
January 2017, 6-14 (available online))

[2] Alpaydın, E., Zeki Veri Madenciliği: Ham Veriden Altın Bilgiye Ulaşma Yöntemler, Bilişim 2000,
Veri madenciliği Eğitim Semineri, 2000.

[3] Likas, A., Vlassisb, N., Verbeekb, J. J., The Global K-Means Clustering Algorithm, Pattern
Recognition, 2003, 36(2), pp 451-461.

[4] R. Capaldo and F. Collova, Clustering: A survey, http://www.slideshare.net/rcapaldo/cluster-


analysis-presentation, (2008).

[5] Density-based clustering algorithms – DBSCAN and SNN by Adriano Moreira, Maribel Y. Santos
and Sofia Carneiro.

[6] Kaufman, L., Rousseeuw, P. J., Clustering by Means of Medoids, Statistical Data Analysis
Based on The L1– Norm and Related Methods, Springer, 1987.

24
THANK YOU

25

You might also like