0% found this document useful (0 votes)
16 views4 pages

What Is Cluster Analysis?

K-means clustering is an unsupervised learning algorithm that groups objects into K number of clusters defined by centroids, where each object belongs to the cluster with the nearest centroid. It works by assigning random initial centroids and iteratively recomputing centroids as the mean of points in each cluster until centroids stop changing. K-means clustering has limitations when clusters differ in size, density or shape or when outliers are present, and determining the optimal K is challenging.

Uploaded by

tara345w
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views4 pages

What Is Cluster Analysis?

K-means clustering is an unsupervised learning algorithm that groups objects into K number of clusters defined by centroids, where each object belongs to the cluster with the nearest centroid. It works by assigning random initial centroids and iteratively recomputing centroids as the mean of points in each cluster until centroids stop changing. K-means clustering has limitations when clusters differ in size, density or shape or when outliers are present, and determining the optimal K is challenging.

Uploaded by

tara345w
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 4

What is Cluster Analysis?

Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Intra-cluster distances are minimized Inter-cluster distances are maximized

K-means Clustering
Partitional clustering approach Each cluster is associated with a centroid (center point) Each point is assigned to the cluster with the closest centroid Number of clusters, K, must be specified The basic algorithm is very simple

K-means Clustering Details


Initial centroids are often chosen randomly. The centroid is (typically) the mean of the points in the cluster. Closeness is measured by Euclidean distance, cosine similarity, correlation, etc. K-means will converge for common similarity measures mentioned above. Most of the convergence happens in the first few iterations. Complexity is O( n * K * I * d )
Often the stopping condition is changed to Until relatively few points change clusters Clusters produced vary from one run to another.

n = number of points, K = number of clusters, I = number of iterations, d = number of attributes

Limitations of K-means
K-means has problems when clusters are of differing
Sizes Densities Non-globular shapes

K-means has problems when the data contains outliers.

The number of clusters (K) is difficult to determine.

You might also like