What Is Cluster Analysis?
What Is Cluster Analysis?
Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Intra-cluster distances are minimized Inter-cluster distances are maximized
K-means Clustering
Partitional clustering approach Each cluster is associated with a centroid (center point) Each point is assigned to the cluster with the closest centroid Number of clusters, K, must be specified The basic algorithm is very simple
Initial centroids are often chosen randomly. The centroid is (typically) the mean of the points in the cluster. Closeness is measured by Euclidean distance, cosine similarity, correlation, etc. K-means will converge for common similarity measures mentioned above. Most of the convergence happens in the first few iterations. Complexity is O( n * K * I * d )
Often the stopping condition is changed to Until relatively few points change clusters Clusters produced vary from one run to another.
Limitations of K-means
K-means has problems when clusters are of differing
Sizes Densities Non-globular shapes