0% found this document useful (0 votes)

60 views25 pages

Techniques of Cluster Analysis: A Seminar On

Cluster analysis techniques are used to group similar objects together. There are hierarchical and non-hierarchical clustering methods. Hierarchical clustering creates nested clusters organized as a tree structure using either agglomerative or divisive approaches. Non-hierarchical methods like k-means and k-medoids clustering partition objects into a predefined number of clusters by optimizing cluster centers or medoids. Clustering algorithms are widely applied in marketing research for tasks like market segmentation and understanding customer behavior.

Uploaded by

VAIBHAV NANAWARE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views25 pages

Techniques of Cluster Analysis: A Seminar On

Uploaded by

VAIBHAV NANAWARE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

A seminar on

Techniques of Cluster
Analysis
Group members

Munner mohammad 47
Vaibhav Nanaware 52
Nishant Nirmal 55
Tejas pawar 60
Bhagwat Shinde 72
Talal Saeed 48

Under the Guidance of

Dr.PRIYADARSHAN DHABE
DEPTARTMENT. OF IT & MCA,VIT PUNE
Content
1. What is Clustering?

2. Cluster Analysis in Marketing Research.

3. Use of Cluster Analysis In Marketing.

4. Working of Clustering?

5. Different types of Cluster Analysis Technique.

6. Clustering Algorithms

7. Conclusion

8. References.

2
What is Clustering?

 Clustering analysis is a group of multivariate techniques

whose primary purpose is to group objects

 Cluster Variate
- represents a mathematical representation of the
selected set of variables which compares the
object’s similarities.

SOURCE :- Cluster Analysis

3
Cluster Analysis in Marketing Research

 Grouping similar customers and products is a fundamental

marketing concept. It is used ,for example, in market
segmentation.

 As companies connect with all there customers, they have

to divide the market into groups of consumers, customers or
clients with similar needs and wants.

Marketing Segmentation 4
Use of cluster analysis in marketing
 Data Reduction

 Potential opportunities for products

 Understanding of consumer behavior in market

 Hypothesis generation

5
Source: use of clustering
How does a cluster analysis work?
 The primary objective of cluster analysis is to
define the structure of the data by placing the
most similar observations into groups.

 To accomplish this task we must address three

basic questions:

 How do we measure similarity?

 How do we form clusters?

 How many clusters do we form?

6
Deriving Clustring

 There are number of methods available to carry

out clustering.
 They can classified as below

• Hierarchical Clustering Analysis

• Non-Hierarchical Clustering Analysis

Source:clustring
7
Hierarchical Clustering Analysis

 A Hierarchical clustering method works via grouping

data into a tree of clusters. Hierarchical clustering
begins by treating every data points as a separate
cluster.
 Then, it repeatedly executes the subsequent steps:
.

1) Identify the 2 clusters which can be closest

together
2) Merge the 2 maximum comparable clusters.

 We need to continue these steps until all the

clusters are merged together.

8
Hierarchical Clustering Analysis -continued

 In Hierarchical Clustering, the aim is to produce a hierarchical

series of nested clusters.
 The basic method to generate hierarchical clustering are
1. Agglomerative Clustering:
• Also known as bottom-up approach or hierarchical agglomerative
clustering (HAC).
• A structure that is more informative than the unstructured set of
clusters returned by flat clustering.
• This clustering algorithm does not require us to prespecify the
number of clusters.

2. Divisive Clustering:
• Also known as top-down approach.
• algorithm also does not require to prespecify the number of
clusters.
• Top-down clustering requires a method for splitting a cluster that
contains the whole data and proceeds by splitting clusters
recursively until individual data have been splitted into singleton
cluster. 9
Agglomerative Algorithm

 Algorithm for Agglomerative Hierarchical Clustering is:

• Calculate the similarity of one cluster with all the other clusters
(calculate proximity matrix)
• Consider every data point as a individual cluster
• Merge the clusters which are highly similar or close to each other.
• Recalculate the proximity matrix for each cluster
• Repeat Step 3 and 4 until only a single cluster remains.

10
Source: Agglomerative image
Divisive Algorithm
 The Divisive Hierarchical clustering is precisely the opposite of
the Agglomerative Hierarchical clustering.

Source:divisive image

11
Non hierarchical clustering
❖ Clustering involves formation of new clusters by merging or
splitting the clusters.
Refer.1]

❖ It does not follow a tree like structure..

❖ Non hierarchical clustering methods

● K-means

● Density-based

12
Source :https://new.pharmacelera.com/science/clustering-methods
K-means
 K-Means algorithm consists of four basic steps: -

1) Determination of centers.

2) Assigning points to clusters which are outside of the centers

according to distance between centers and points.

3) Calculation of new centers.

4) Repeating these steps until obtaining decided clusters.

13
K-Means -conti’d

❖ K-Means Algorithms

✔ Assign initial values for means point {u1 , u2 , u3 ,., uk}

✔ Repeat:

• Assign each item xi to the clusters which has closest mean

• Calculate new mean for each cluster:

Source : https://www.gatevidyalay.com/k-means-clustering-algorithm-example/
14
K-Medoids Clustering

 A medoid can be defined as the point in the cluster, whose

dissimilarities with all the other points in the cluster is
minimum.
 The dissimilarity of the medoid(Ci) and object(Pi) is
calculated by using E = |Pi - Ci|

 The cost in K-Medoids algorithm is given as

15
K-Medoids Clustering - conti’d

Algorithm
1. Initialize: select k random points out of the n data
points as the medoids.

2. Associate each data point to the closest medoid by

using any common distance metric methods.

3. While the cost decreases:

For each medoid m, for each data o point
which is not a medoid:
1. Swap m and o, associate each data
point to the closest medoid, recompute the cost.
2. If the total cost is more than that in the
previous step, undo the swap.
16
Example for K-medoids Clustering

 Lets Consider below data set for understanding how

K-medoids clustering works

Source :
graphical-representation

Source: data-set table

17
Continued
 Step 1:
Let the randomly selected 2 medoids, so select k = 2 and let
C1 -(4, 5) and C2 -(8, 5) are the two medoids.
 Step 2: Calculating cost.
 The points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go to
cluster C2.
 The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20

18
Source: Dissimilarity
Continued
 Step 3:
 Each point is assigned to that cluster whose dissimilarity is
less. So, the points 1, 2, 5 go to cluster C1 and 0, 3, 6, 7, 8 go
to cluster C2.
 The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0

Source: Dissimilarity 19
 As the swap cost is not less than zero, we undo the swap.
Hence (3, 4) and (7, 4) are the final medoids.
 The clustering would be in the following way

20
Density Based Clustering
Algorithmic steps for DBSCAN clustering
Let X = {x1, x2, x3, ..., xn} be the set of data points. DBSCAN requires
two parameters: ε (eps) and the minimum number of points required to
form a cluster (minPts).

 Step 1.
Start with an arbitrary starting point that has
not been visited.

 Step 2.
Extract the neighborhood of this point using ε

 Step 3.
If there are sufficient neighborhood around this point
then clustering process starts and point is marked as
visited else this point is labeled as noise.
21
Continued
 Step 4.

If a point is found to be a part of the cluster then its ε

neighborhood is also the part of the cluster and the above
procedure from step 2 is repeated for all ε neighborhood
points. This is repeated until all points in the cluster is
determined.

 Step 5.

A new unvisited point is retrieved and processed, leading to

the discovery of a further cluster or noise.

 Step 6.

This process continues until all points are marked as visited.

22
Conclusion
 Clustering is one of the important methods for data mining
applications.

 We have seen various algorithm which r used for

clustering like DBSCAN, Agglomerative but the most
widely used algorithm is K-means.

 Clustering helps in understanding the natural

grouping in a dataset.

 The quality of clustering depends on the both the

similarity measure used by the method and its
implementation.

23
References
[1] (Gulagiz F.K and Sahin S. (2017) Comparison of Hierarchical and Non Hierarchical Clustering
Algorithms, International Journal of Computer Engineering and Information Technology
January 2017, 6-14 (available online))

[2] Alpaydın, E., Zeki Veri Madenciliği: Ham Veriden Altın Bilgiye Ulaşma Yöntemler, Bilişim 2000,
Veri madenciliği Eğitim Semineri, 2000.

[3] Likas, A., Vlassisb, N., Verbeekb, J. J., The Global K-Means Clustering Algorithm, Pattern
Recognition, 2003, 36(2), pp 451-461.

[4] R. Capaldo and F. Collova, Clustering: A survey, http://www.slideshare.net/rcapaldo/cluster-

analysis-presentation, (2008).

[5] Density-based clustering algorithms – DBSCAN and SNN by Adriano Moreira, Maribel Y. Santos
and Sofia Carneiro.

[6] Kaufman, L., Rousseeuw, P. J., Clustering by Means of Medoids, Statistical Data Analysis
Based on The L1– Norm and Related Methods, Springer, 1987.

24
THANK YOU

BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Internship Report 2023-24 Data Science
100% (2)
Internship Report 2023-24 Data Science
23 pages
ML Unsupervised
No ratings yet
ML Unsupervised
35 pages
Clustering
No ratings yet
Clustering
11 pages
Unit 4
No ratings yet
Unit 4
29 pages
Clustering
No ratings yet
Clustering
104 pages
Session11-Parts 21-22
No ratings yet
Session11-Parts 21-22
171 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
BIS 541 Ch04 20-21 S
No ratings yet
BIS 541 Ch04 20-21 S
82 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Complete Clustering
No ratings yet
Complete Clustering
80 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Lec. 15-Final. ClusAdvanced
No ratings yet
Lec. 15-Final. ClusAdvanced
103 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Unit 5
No ratings yet
Unit 5
85 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
M5
No ratings yet
M5
40 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
M5
No ratings yet
M5
40 pages
Unit IV
No ratings yet
Unit IV
96 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Clustering
No ratings yet
Clustering
32 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Clustering
No ratings yet
Clustering
25 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Concepts and Techniques: - Chapter 11
No ratings yet
Concepts and Techniques: - Chapter 11
103 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
MLUnit III
No ratings yet
MLUnit III
42 pages
Lec.3.D. M. Spring 2025
No ratings yet
Lec.3.D. M. Spring 2025
21 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Unit VII
No ratings yet
Unit VII
30 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Clustering Methods
No ratings yet
Clustering Methods
14 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Unit 4
No ratings yet
Unit 4
16 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Data Entry Interview Questions
100% (1)
Data Entry Interview Questions
3 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Handbook of Educational Data Mining PDF
No ratings yet
Handbook of Educational Data Mining PDF
526 pages
Design and Implementation of An Online Marketing Information Management System
100% (1)
Design and Implementation of An Online Marketing Information Management System
10 pages
17 Free Data Science Projects To Boost Your Knowledge & Skills
100% (1)
17 Free Data Science Projects To Boost Your Knowledge & Skills
9 pages
It For Managers: An Assignment On What Are Data Warehousing, Meta Data, Data Mining, and The Uses
0% (1)
It For Managers: An Assignment On What Are Data Warehousing, Meta Data, Data Mining, and The Uses
13 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
Introduction To Big Data Analytics
No ratings yet
Introduction To Big Data Analytics
35 pages
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
No ratings yet
COSC 6335 Data Mining (Dr. Eick) Solution Sketches Midterm Exam October 25, 2012
11 pages
AI.5-Machine Learning (21-26)
No ratings yet
AI.5-Machine Learning (21-26)
196 pages
Intro To Data Minning
No ratings yet
Intro To Data Minning
24 pages
Detection of Breast Cancer Using Data Mining Tool WEKA PDF
No ratings yet
Detection of Breast Cancer Using Data Mining Tool WEKA PDF
5 pages
David L. Olson, Desheng Wu - Predictive Data Mining Models (2nd Ed.) - Springer (2020)
No ratings yet
David L. Olson, Desheng Wu - Predictive Data Mining Models (2nd Ed.) - Springer (2020)
127 pages
Web Mining
No ratings yet
Web Mining
73 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
Statistical and Machine Learning Data Mining Techniques For Better Predictive Modeling and Analysis of Big Data Second Edition Bruce Ratnerdownload
100% (1)
Statistical and Machine Learning Data Mining Techniques For Better Predictive Modeling and Analysis of Big Data Second Edition Bruce Ratnerdownload
28 pages
Business Analytics
No ratings yet
Business Analytics
9 pages
Comp 1942 finalExamQuestion-2019
No ratings yet
Comp 1942 finalExamQuestion-2019
14 pages
Data Mining Review - 1
No ratings yet
Data Mining Review - 1
9 pages
Predicting Start-Up Success With Machine Learning: Francisco Ramadas Da Silva Ribeiro Bento (M2013022)
No ratings yet
Predicting Start-Up Success With Machine Learning: Francisco Ramadas Da Silva Ribeiro Bento (M2013022)
98 pages
Lec 05 - K-Means
No ratings yet
Lec 05 - K-Means
4 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Framework For Building ML Systems: Crisp-Dm
No ratings yet
Framework For Building ML Systems: Crisp-Dm
28 pages
Qualis Eventos Computação 2019 Provisório
No ratings yet
Qualis Eventos Computação 2019 Provisório
72 pages
Coursework Specification
No ratings yet
Coursework Specification
7 pages
Exploring The High Potential Factors That Affects Students' Academic Performance
No ratings yet
Exploring The High Potential Factors That Affects Students' Academic Performance
9 pages
Choudhery2017 Social Media Mining Prediction of Box Office Revenue
No ratings yet
Choudhery2017 Social Media Mining Prediction of Box Office Revenue
10 pages
Viral Marketing in Social Network Using Data Mining: Shalini Sharma, Vishal Shrivastava
No ratings yet
Viral Marketing in Social Network Using Data Mining: Shalini Sharma, Vishal Shrivastava
5 pages
University of Waikato: Data Mining With Weka
No ratings yet
University of Waikato: Data Mining With Weka
3 pages

Techniques of Cluster Analysis: A Seminar On

Uploaded by

Techniques of Cluster Analysis: A Seminar On

Uploaded by

A seminar on

Under the Guidance of

2. Cluster Analysis in Marketing Research.

3. Use of Cluster Analysis In Marketing.

5. Different types of Cluster Analysis Technique.

 Clustering analysis is a group of multivariate techniques

SOURCE :- Cluster Analysis

 Grouping similar customers and products is a fundamental

 As companies connect with all there customers, they have

 Potential opportunities for products

 Understanding of consumer behavior in market

 To accomplish this task we must address three

 How do we measure similarity?

 How do we form clusters?

 How many clusters do we form?

 There are number of methods available to carry

• Hierarchical Clustering Analysis

• Non-Hierarchical Clustering Analysis

 A Hierarchical clustering method works via grouping

1) Identify the 2 clusters which can be closest

 We need to continue these steps until all the

 In Hierarchical Clustering, the aim is to produce a hierarchical

 Algorithm for Agglomerative Hierarchical Clustering is:

❖ It does not follow a tree like structure..

❖ Non hierarchical clustering methods

2) Assigning points to clusters which are outside of the centers

3) Calculation of new centers.

4) Repeating these steps until obtaining decided clusters.

✔ Assign initial values for means point {u1 , u2 , u3 ,., uk}

• Assign each item xi to the clusters which has closest mean

 A medoid can be defined as the point in the cluster, whose

 The cost in K-Medoids algorithm is given as

2. Associate each data point to the closest medoid by

3. While the cost decreases:

 Lets Consider below data set for understanding how

Source: data-set table

If a point is found to be a part of the cluster then its ε

A new unvisited point is retrieved and processed, leading to

This process continues until all points are marked as visited.

 We have seen various algorithm which r used for

 Clustering helps in understanding the natural

 The quality of clustering depends on the both the

[4] R. Capaldo and F. Collova, Clustering: A survey, http://www.slideshare.net/rcapaldo/cluster-

You might also like