Partitioning Methods
Partitioning Methods
Center-based
A cluster is a set of objects such that an object in a cluster is closer (more
similar) to the “center” of a cluster, than to the center of any other cluster .
The center of a cluster is often a centroid, the average of all the points in the
cluster, or a medoid, the most “representative” point of a cluster .
1 .partitioned methods:
1 .k-means: The k-means algorithm for partitioning where each cluster’s center is
represented by the mean value of the objects in the cluster
K-Means Properties: Advantages
Disadvantage
Can be applied only when the mean of a cluster is defined
It is sensitive to noise and outlier data points (can influence the mean value)
• The algorithm minimizes the sum of the dissimilarities between each object and
its corresponding reference point
• E: the sum of absolute error for all objects in the data set
Advantages
K-melodies method is more robust than k-Means in the presence of noise and
outliers.
Disadvantages
3 . CLARA
k-medoids partitioning algorithm work effectively for small data sets but dose not
scale well for large data sets, to deal with large data sets can be used ( clustering
large application) ,clara
Draw multiple samples of the data set, apply PAM on each sample, give the
best clustering
Perform better than PAM in larger data sets
Efficiency depends on the sample size
Strength: deals with larger data sets than PAM
4. Clarans (clustering large applications based upon randomized search )
Advantages
• Experiments show that CLARANS is more effective than both PAM and CLARA
• Handles outliers.
Disadvantages : The clustering quality depends on the sampling method.