Cluster Analysis: G Sreenivas
Cluster Analysis: G Sreenivas
Cluster Analysis: G Sreenivas
G SREENIVAS
Cluster Analysis
● Clustering :
Clustering is the process of grouping a data set in a way
that the similarity between data within a cluster is
maximized while the similarity between data of different
clusters is minimized.
● Clusters :
A cluster is a collection of data objects that are similar
to one another within the same cluster and are
dissimilar to the objects in other clusters.
What Is Good Clustering?
● Dissimilarity/Similarity metric:
Similarity is expressed in terms of a distance function, which
is typically metric : d(i, j)
● Properties
○ d(i,j) 0
○ d(i,i) = 0
○ d(i,j) = d(j,i)
○ d(i,j) d(i,k) + d(k,j)
● Also one can use weighted distance, parametric Pearson
product moment correlation, or other dissimilarity
measures.
Finding a Centroid
Use the following equation we can find the centroid of k
n-dimensional points :
Let’s find the centroid between 3 2-D points, say: (2,4) (5,2) (8,9)
Major Clustering Approaches
●Partitioning algorithms :
●Example 1
● Reference:
1. Chapter 8: Data mining: Concepts and Techniques:
Jiawei Han and Micheline Kamber, Morgan Kaufmann
2. http://en.wikipedia.org/wiki/Cluster_analysis
3. http://home.dei.polimi.it//matteucc/clustering/tutorial html
/heirarchical.html
THANK YOU